About Code release for "Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting" (NeurIPS 2021), https://arxiv.org/abs/2106.13008
MIT License
2k
stars
429
forks
source link
Potential data leakage issue in data_loader.py Dataset_Custom class #196
In the current implementation of the dataset partitioning within the Dataset_Custom class, there seems to be a potential risk of data leakage among the training, testing and validation sets. This is due to the boundaries (border1s and border2s) not accounting for the sequence length (seq_len) before partitioning the dataset.
Please correct me if my thought is wrong : )
Suggested Modification
To prevent data leakage and ensure that each dataset partition is working with distinct, non-overlapping data points, it would be beneficial to subtract the seq_len from the total data length before defining the partitions. Here is a suggested modification:
Issue Description
In the current implementation of the dataset partitioning within the
Dataset_Custom
class, there seems to be a potential risk of data leakage among the training, testing and validation sets. This is due to the boundaries (border1s
andborder2s
) not accounting for the sequence length (seq_len
) before partitioning the dataset.Please correct me if my thought is wrong : )
Suggested Modification
To prevent data leakage and ensure that each dataset partition is working with distinct, non-overlapping data points, it would be beneficial to subtract the
seq_len
from the total data length before defining the partitions. Here is a suggested modification: