Unable to reproduce the results from paper

rakshitha123 / TSForecasting

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

https://forecastingdata.org/

Other

206 stars 44 forks source link

Unable to reproduce the results from paper #11

Closed 18kiran12 closed 2 years ago

18kiran12 commented 2 years ago

Hi,

Monash Forecasting Repository and your work is greatly appreciated. Thanks a lot for making the work reproducible.

However, I tried experimenting with a few datasets and found that I was unable to reproduce the same results for the local models like ARIMA, ETS, SES, etc. Here are the results that I found for the COVID dataset. I use the same script and the same lag and horizon values. Could you please let me know if I am doing something wrong?. Should I change some parameters for these local models in order to attain the results reported in the paper?.

These are the results I got for the COVID dataset.

	ETS	TBATS	SES	ARIMA
MyExperiment	8.98	8.98	8.977	6.104
Published Results	5.33	5.72	7.776	6.117

Additionally, I also found that TBATS, ETS, and SES mostly always show the same error. Do you have an idea about why this could be true for the COVID dataset?

rakshitha123 commented 2 years ago

Hi,

Thanks for your interest in our work.

I just executed the above mentioned models again for the COVID Deaths dataset, and I could obtain the same results for the mean MASE as reported in the paper.

May I know the version of the "forecast" package you are using? The experiments are conducted with the 8.12 version of the forecast package.

Can you also please confirm the way you executed the experiments? For example, to run SES for the COVID Deaths dataset, you just need to run line 247 of fixed_horizon.R. If it is fine, can you please confirm whether you executed the experiments in the same way? Just to makesure all the parameters are the same.

18kiran12 commented 2 years ago

Hi,

Thanks a lot for the quick reply. I am new to R and therefore could not understand that the library("forecast") was not loaded. Strangely, R did not also throw me an error asking me to load the library. Because of this, I was unable to reproduce the results even with the same script and the same dataset.

On a side note, could you also publish the version of R and the packages needed so that one might find it easy to reproduce?.

Thanks a lot again.

rakshitha123 commented 2 years ago

Hi,

That's a really great suggestion. Accordingly, I have now added a table in the readme file containing the versions of all the software/packages we used.

shchur commented 1 year ago

Hi @rakshitha123, thank you for this great work and making the code & the datasets public!

I encountered the same problem, where I couldn't reproduce the published results locally. Do you happen to know what the source of these discrepancies might be?

I re-ran the code from fixed_horizon.R and made sure that the correct package versions are installed (forecast_8.12 and smooth_2.6.0).

I also tried re-implemented the evaluation pipeline in Python using the R code as reference, and the results that I get in Python are identical to what I get in R (both different from the paper). For example, for the vehicle_trips_dataset_without_missing_values dataset I obtained mean MASE scores of

	TBATS	SES	Theta	ETS	ARIMA	PR
My experiment	1.856	2.273	1.914	1.964	2.051	2.196
Published	1.860	1.224	1.244	1.305	1.282	1.212

As a sanity check, I also ran the snaive baseline - implemented both in Python in R - and in both cases obtained the mean MASE score of 2.026, which seems quite close to the other results from my experiment.

Do you have any pointers as for what I could do next to find the problem in my setup?

rakshitha123 commented 1 year ago

Hello @shchur,

Thanks for your interest in our work.

May I kindly know whether the other error measures (mean msMAPE, Mean MAE, Mean RMSE etc.) calculated using the vehicle trips dataset are also different from the published results in the paper/appendix? Because I want to know whether the mismatch occurs due to an issue of MASE calculation or due to the generated forecasts.

shchur commented 1 year ago

Thank you for the quick response!

For mean/median sMAPE/MAE (and all other non-seasonal metrics) the results for PR and SES are identical to the paper, but for ETS and Theta the numbers are quite different from the paper.

It looks as if the problem is with how the seasonal period is calculated in the code.

shchur commented 1 year ago

I played around with the code a bit more and observed that if I change SEASONALITY_VALS[[7]] <- 7 to SEASONALITY_VALS[[7]] <- 24 in fixed_horizon_functions.R, then both MASE results for all models & sMAPE/MAE scores for Theta and ETS get almost equal to the ones published in the appendix.

Could it be that there was an off-by-one error in the original version of the experiments that is the source of this discrepancy?

shchur commented 1 year ago

Hi @rakshitha123, can you please check if you can reproduce my findings using the latest version of the code from this repo? I'm not sure if I should continue looking for bugs in my experimental setup or rather re-run the experiments using the provided code & use them as reference. Since I'm a total newbie in R there is a possibility that I misconfigured something, but I'm not sure how to validate or refute this hypothesis 😅