ibm-granite / granite-tsfm

Foundation Models for Time Series
Apache License 2.0
336 stars 165 forks source link

Normalization [Test] #82

Open francoisailah opened 2 months ago

francoisailah commented 2 months ago

Hello,

During time series normalization, are all data points within a channel (X) used to calculate the mean and standard deviation for normalization? Subsequently, is the normalized data split into past and future values for modeling?

Alternatively, is a forward normalization technique employed, where statistics are calculated on a rolling window basis for each prediction?

Thank you,

wgifford commented 2 months ago

If you make use of the get_datasets function here: https://github.com/ibm-granite/granite-tsfm/blob/main/tsfm_public/toolkit/time_series_preprocessor.py#L806

This is the process assuming the preprocessor was instantiated with scaling=True:

  1. Data is split into train, validation, test according to the split_config.
  2. The training data is used to train the preprocessor: time_series_preprocessor.train(...)
  3. Each of the splits (train, validation, test) are preprocessed by the trained preprocessor time_series_preprocessor.preprocess(...)
  4. Torch datasets are created from the preprocessed data. This process segments the time series data into windows of context_length followed by windows of prediction_length which then become the past and future value tensors.
francoisailah commented 2 months ago

It seems that the test_data is normalized based on the mean and standard deviation of the train_data, which makes sense. Thank you,

francoisailah commented 2 months ago

Hello,

In grp[self.target_columns] = self.target_scaler_dict[name].transform(grp[self.target_columns]) Where can I find the transform method?

wgifford commented 2 months ago

target_scaler_dict is a dictionary of scalers -- they will be one of standard scaler or minmax scaler from sklearn.

francoisailah commented 2 months ago

In this case, looking at the following codes, the test_data (train_data, valid_data) is normalized by its min and max (or mu and std). Then, the data is split into past_values, future_values, etc. Is this correct?

get torch datasets

train_valid_test = [train_data, valid_data, test_data] train_valid_test_prep = [ts_preprocessor.preprocess(d) for d in train_valid_test]

Thank you

wgifford commented 2 months ago

The preprocess method is called on each of train, valid, test. So it happens after splitting.

francoisailah commented 2 months ago

In this case, normalizing the entire dataset could introduce look-ahead bias. Let's assume that the vector X is normalized by its mean mu_X and std_X, and then we split X into past values X1 and future values X2 (X = concat([X1, X2])), and the model aims to forecast X2 based on X1. After normalization, mu_X = 0 and mu_X = w1mu_X1 + w2mu_X2 (w1 = n1/(n1 + n2), w2 =n2/(n1+n2), n1=len(X1) and n2=len(X2)), then *mu_X2 = (-w1/w2) mu_X1** , that means that the model has some information about the unseen data X2.

I am looking at the codes, it seems that the scaler is fitting on the train data, which means that the test_data is normalized using the mean and std of the train data (transform). In this case, there is no look-ahead bias.

Thank you,

wgifford commented 1 month ago

Hi @francoisailah a couple points:

  1. As you state in the second part of your comment, the test data is never used to learn the scaling factors, so there is no look-ahead bias
  2. We learn the mean on the entire train set, but the context and prediction windows are constructed in rolling fashion and are generally much smaller than the full length of the train set. Let's pick one example context window (X1) and the following prediction window (X2). Then the entire dataset X could be represented as X = concat([W, X1, X2, Y]), where W and Y are other parts of the train dataset that are not relevant for this particular training example. We know mu_X=0, but mu_X = mu_W + mu_X1 + mu_X2 + mu_Y -- with the presence of mu_W and mu_Y there is no clear information passing into the model when it is given X1.