Comparison with results from original M4 participants?

cbergmeir commented 9 months ago

You state on your leaderboard that TimesNet is the best for "short-term forecasting", which I take means the best on the M4 dataset? Your TimesNet paper reports an OWA of 0.851 for your method and of 0.855 for NBEATS.

The M4 was won by a method with an OWA of 0.821, see Table 4 in:

Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1), 54-74.

Also, NBEATS reported in their original paper OWAs way better, namely 0.795, see:

https://arxiv.org/pdf/1905.10437.pdf

I'm wondering where these differences come from? Are you using a subset of the M4 only, for example?

wuhaixu2016 commented 9 months ago

Hi, we use the complete dataset of M4.

As we stated in the paper, N-BEATS employs a special ensemble method, which incorporates the results predicted from different input series. Its final results are ensembled from 7 models.

Thus, to ensure a fair comparison, we test all the models with one single input length.

cbergmeir commented 9 months ago

What about the original participants in the M4 then?

wuhaixu2016 commented 9 months ago

Note that different from competition, for research, a fair comparison is essential. Without fair comparison, we cannot obtain any scientific conclusions.

If you try to apply TimesNet to M6 or other competitions, you can also employ the ensemble strategy to TimesNet like N-BEATS. But for us and this repo, as we stated in the README.md, this repo is to provide a clean code base for researchers. We still insist the single-model comparison configuration.

For more information:

Our leaderboard only compares several deep models. If you read the README.md carefully, you may find this description: “Compared models of this leaderboard. ☑ means that their codes have already been included in this repo.”

We believe that there are other wonderful time series forecasting models, maybe some of them are non-DL models. But this repo only compares the listed deep models as we stated in the README.md.

thuml / Time-Series-Library

Comparison with results from original M4 participants? #293