unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.62k stars 831 forks source link

Timeseries data Augmentation #2253

Open Vignesh-Sampat opened 4 months ago

Vignesh-Sampat commented 4 months ago

Is there any way do include data augmentation strategies in the darts Timeseries object?

madtoinou commented 4 months ago

Hi @Vignesh-Sampat,

What do you mean exactly by data augmentation? Could you please give examples?

Vignesh-Sampat commented 3 months ago

Data augmentation is a collection of methods employed to expand the size of a dataset by introducing modified versions of existing data or generating synthetic data based on existing samples. Employing on-the-fly data augmentation within the dataloader, with a probability value (e.g., P=0.5), would be more advantageous compared to the traditional method of modifying, saving, and subsequently adding to the existing dataset.

Below are the list of example functions that can be used to reduce the risk of overfitting.

Gaussian noise/Jittering:

def add_gaussian_noise(time_series, mean=0.0, stddev=1.0):
    """
    Adds Gaussian noise to a time series.

     Options:
     time_series (array-like): A time series to which noise is added.
     mean (float): The average value of the noise. Default is 0.0.
     stddev (float): Standard deviation of noise. Default is 1.0.

     Returns:
     noisy_series (np.array): Time series with added noise.
     """
     # Gaussian noise generation
    noise = np.random.normal(mean, stddev, len(time_series))

    # Adding noise to the original time series
    noisy_series = time_series + noise

    return noisy_series

augmented_time_series_data = add_gaussian_noise(time_series_data, mean=0.0, stddev=0.05)

Random Scaling

def add_scaling(time_series, scale_factor):
    """
    Scales a time series by multiplying each element by scale_factor.

    :param time_series: numpy array, time series to be scaled
    :param scale_factor: the number by which all elements of the series will be multiplied
    :return: numpy array, scaled time series
    """
    scaled_time_series = time_series * scale_factor
    return scaled_time_series

scale_factor = 0.48 augmented_time_series_data = add_scaling(time_series_data, scale_factor)

magnitude warping

def magnitude_warping(time_series, num_knots=4, warp_std_dev=0.2):
    """
    Applies magnitude warping to a time series using cubic splines.

    :param time_series: np.array, time series to distort
    :param num_knots: int, number of control points for splines
    :param warp_std_dev: float, standard deviation for distorting the values of control points
    :return: np.array, distorted time series
    """
    # Generating random spline knots within a time series
    knot_positions = np.linspace(0, len(time_series) - 1, num=num_knots)
    knot_values = 1 + np.random.normal(0, warp_std_dev, num_knots)

    # Creating a Cubic Spline Function Through Knots
    spline = CubicSpline(knot_positions, knot_values)

    # Generating time indexes for a time series
    time_indexes = np.arange(len(time_series))

    # Applying distortion to a time series
    warped_time_series = time_series * spline(time_indexes)

    return warped_time_series

augmented_time_series_data = magnitude_warping(time_series_data, num_knots=5, warp_std_dev=0.45)

Time warping

def time_warping(time_series, num_operations=10, warp_factor=0.2):
    """
    Applying time warping to a time series.

    :param time_series: Time series, numpy array.
    :param num_operations: Number of insert/delete operations.
    :param warp_factor: Warp factor that determines the impact of operations.
    :return: Distorted time series.
    """
    warped_series = time_series.copy()
    for _ in range(num_operations):
        operation_type = random.choice(["insert", "delete"])
        index = random.randint(1, len(warped_series) - 2)
        if operation_type == "insert":
            # Insert a value by interpolating between two adjacent points
            insertion_value = (warped_series[index - 1] + warped_series[index]) * 0.5
            warp_amount = insertion_value * warp_factor * random.uniform(-1, 1)
            warped_series = np.insert(warped_series, index, insertion_value + warp_amount)
        elif operation_type == "delete":
            # Remove a random point
            warped_series = np.delete(warped_series, index)
        else:
            raise ValueError("Invalid operation type")

    return warped_series

augmented_time_series_data = time_warping(time_series_data, num_operations=20, warp_factor=0.25)

ashishkr23 commented 1 month ago

I am also looking for references/examples of how to add augmentations to trainings is darts. Unable to find anything. @Vignesh-Sampat were you able to solve this ? Thanks