Comparisons might not be consistent across batch sizes

yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730

Apache License 2.0

1.37k stars 248 forks source link

Comparisons might not be consistent across batch sizes #25

Closed rajatsen91 closed 1 year ago

rajatsen91 commented 1 year ago

The prediction and actual sizes here are not the same across batch sizes. This means that the metrics calculated are not exactly comparable across different models trained with different batch sizes.

Therefore if my understanding is correct all models might need to be reevaluated.

I think the culprit is here: https://github.com/cure-lab/LTSF-Linear/blob/main/data_provider/data_factory.py#L20

The drop_last should be False during the test evaluations.

ikvision commented 1 year ago

Indeed for different batch sizes the test set will be different as we drop the last one. I think this issues is similar to https://github.com/yuqinie98/PatchTST/issues/7

rajatsen91 commented 1 year ago

Yes indeed. Sorry I missed the previous issue. I also found that this does change results somewhat significantly in ETTh2 dataset as well.

namctin commented 1 year ago

Yes that is a valid question. We noticed that when @oguiza raised the issue. With self-supervised model, the result is with respect to drop_last=False and you can see the performance did not affect much, as shown in the paper.