Closed rajatsen91 closed 1 year ago
Indeed for different batch sizes the test set will be different as we drop the last one. I think this issues is similar to https://github.com/yuqinie98/PatchTST/issues/7
Yes indeed. Sorry I missed the previous issue. I also found that this does change results somewhat significantly in ETTh2 dataset as well.
Yes that is a valid question. We noticed that when @oguiza raised the issue. With self-supervised model, the result is with respect to drop_last=False and you can see the performance did not affect much, as shown in the paper.
The prediction and actual sizes here are not the same across batch sizes. This means that the metrics calculated are not exactly comparable across different models trained with different batch sizes.
Therefore if my understanding is correct all models might need to be reevaluated.
I think the culprit is here: https://github.com/cure-lab/LTSF-Linear/blob/main/data_provider/data_factory.py#L20
The drop_last should be False during the test evaluations.