zhihanyue / ts2vec

A universal time series representation learning framework
MIT License
619 stars 148 forks source link

Evaluate code for univariate forcasting task maybe is not right. There is information leaking. #35

Open wiwi opened 1 year ago

wiwi commented 1 year ago

As title said that the results of univariate forcasting provide by this repository and the related paper maybe is not right because the evaluate code of univariate forcasting leads to information leaking. The following figure described the evaluate code of univariate forcasting in this repository.

image

The main problem is that eval_forecasting() encode data through sliding window(sliding step=1) and then split the training/validation/test data set. Hence, it will lead to information leaking. The general way to do this is splitting the data first and then encodes them respectively.

This bug is found in another repository which inherit evaluate code of TS2VEC, and we found the unfair evaluate code would contribute about 10% improvement on Electricity dataset about such project. Then, we checked the evaluate code of univariate forcasting in this repository and found the same operation.

So, I hope authors check the evaluate code of univariate forcasting, especially the result presented in the TS2VEC paper because maybe such results are wrong. If this problem is confirmed, I hope the authors will revise the results of the paper to avoid unfair comparisons.

zhihanyue commented 1 year ago

The figure you provided is wrong. We perform causal inference. For timestamp t, the sliding window is from t-T+1 to t. Therefore, there is no information leakage. The following is the corrected figure. WX20230526-100917@2x

zhihanyue commented 1 year ago

I have checked your another issue xingyu617/SimTS_Representation_Learning#5 and seem to understand your misunderstanding. The code you provided is inappropriate.

# original code
# all_repr = model.encode(
#     data[:, :, n_covariate_cols:],
#     casual=True,
#     sliding_length=1,
#     sliding_padding=padding,
#     batch_size=128
# )
# print("all_repr",data.shape, all_repr.shape)
# train_repr = all_repr[:, train_slice]
# valid_repr = all_repr[:, valid_slice]
# test_repr = all_repr[:, test_slice]

# your code
train_data = data[:, train_slice, n_covariate_cols:]
valid_data = data[:, valid_slice, n_covariate_cols:]
test_data = data[:, test_slice, n_covariate_cols:]
print("data shape:", train_data.shape, valid_data.shape, test_data.shape)

train_repr = model.encode(
    train_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
valid_repr = model.encode(
    valid_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
test_repr = model.encode(
    test_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)

For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

wiwi commented 1 year ago

I have checked your another issue xingyu617/SimTS_Representation_Learning#5 and seem to understand your misunderstanding. The code you provided is inappropriate.

# original code
# all_repr = model.encode(
#     data[:, :, n_covariate_cols:],
#     casual=True,
#     sliding_length=1,
#     sliding_padding=padding,
#     batch_size=128
# )
# print("all_repr",data.shape, all_repr.shape)
# train_repr = all_repr[:, train_slice]
# valid_repr = all_repr[:, valid_slice]
# test_repr = all_repr[:, test_slice]

# your code
train_data = data[:, train_slice, n_covariate_cols:]
valid_data = data[:, valid_slice, n_covariate_cols:]
test_data = data[:, test_slice, n_covariate_cols:]
print("data shape:", train_data.shape, valid_data.shape, test_data.shape)

train_repr = model.encode(
    train_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
valid_repr = model.encode(
    valid_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)
test_repr = model.encode(
    test_data,
    casual=True,
    sliding_length=1,
    sliding_padding=padding,
    batch_size=128
)

For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

You means that you use padding to seperate training set and testing set. But, I think I didnot misunderstand because the whole seqence are feed into eval_forecasting(). In the train.py:

elif task_type == 'forecasting':
            out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols)

Here, 'data' is came from:

elif args.loader == 'forecast_csv_univar':
        task_type = 'forecasting'
        data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols = datautils.load_forecast_csv(args.dataset, univar=True)
        train_data = data[:, train_slice]

If you still think I misunderstand, please give me more detail information about which code to make sure the training set and testing set are independent. Thanks.

zhihanyue commented 1 year ago
  1. Please read the code of model.encode() carefully, and you can confirm that the real behavior is our corrected figure.
  2. I've already explained why your code is inappropriate. The following is what I think you misunderstood:

    For example, we assume t is the first timestamp in test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

We confirm that our code ensures the following behavior:

From Behavior 1, we conclude that there is no data leakage. If you think Behavior 1 is not true, please show the evidence rather than asking for code explaining. I'm sorry I don't have that much time.

Note that we are responsible for Behavior 1. This means that if you find Behavior 1 is not true under specific cases, please reopen the issue. Thanks.