Metric fix, added GluonTS for Baselines, added AutoPytorchTS, added Local Code Option

limpbot commented 1 year ago

Hi @sebhrusen, @PGijsbers, Strongly hoping to get this PR accepted by Christmas, if you could review it in time, so I can add the required changes, I would highly appreciate it.

The following changes exist.

Renaming forecast_range_in_steps to forecast_horizon_in_steps, because it is much more popular in the research field.

Fixed metrics calculation. A) Changed y_past_period_error to seasonal error, because this is the normally used for MASE. Specifying seasonality means fixed metric calculation. B) Renamed ncrps (Normalized Continuous Ranked Probability Score) to mwql (Mean Weighted Quantile Loss), because that is what is actually calculated. This is an approximation of ncrps. C) Added item_id column to results, to be able to calculate itemwise_mean.

Added GluonTS framework, because it allows to calculate multiple timeseries baselines. GluonTS_* 1) Prophet, 2) DeepAR, 3) NBEATS, 4) NPTS, 5) SeasonalNaive, 6) SimpleFeedForward, 7) MQCNN, 8) MQRNN, 9) TFT, 10) ARIMA, 11) ETS, 12) STL-AR, 13) Theta.

Added AutoPyTorchTS for comparison. Only timeseries for now, tabular implementation missing.

Added improvements for AutoGluonTS. A) Do not deepcopy the dataset to save memory. B) Use correct module to retrieve the AutoGluon version.

Added "local" framework configuration file for testing local code. This is achieved by forwarding the USER_DIR to the setup.sh script leveraging the environment variables.

Kind regards, Leo

FYI: @Innixma @canerturkmen @gidler

limpbot commented 1 year ago

Thank you for your quick comments. Couldn't work on it last week because I was out of the office.

limpbot commented 1 year ago

Removed the local frameworks configuration file and the tag in the general configuration file. Further, outsourced the time series frameworks into a separate frameworks_timeseries.yaml file.

PGijsbers commented 1 year ago

Unfortunately, I did not find the time to actually try and run the code. I am running into HDF5 issues building docker containers (MacOS is not supported for GluonTS), but I don't think it's specific to this PR.

I am still missing documentation on the dataset formats and the introduced dataset meta-data (the attributes forecast_horizon_in_steps, id_column, timestamp_column, seasonality). While some of them are somewhat self-explanatory, a brief description of what they are and which data formats are allowed for them is important for other people to be able to use it with their datasets.

Otherwise, if @sebhrusen OKs this then I am good with it too.

With this I am signing off for year. Happy holidays and see you in January :)

limpbot commented 1 year ago

Just added some documentation in the HOWTO.md file. Thank you for the intensive review. Today marks my last day as intern, @shchur will take over this PR next year. It was very motivating for me to contribute something to this great open-source benchmark solution which enables fair comparison and reduces stress to reproduce results from competitors.

Happy Holidays!

PGijsbers commented 1 year ago

Thanks you very much for the contribution 🙏 🎉

shchur commented 1 year ago

Hi @PGijsbers and @sebhrusen, thank you for the comments on the PR! Are there any remaining items that need to be addressed before the PR can be merged?

Innixma commented 1 year ago

@sebhrusen @PGijsbers Would greatly appreciate if we can understand next steps for this PR. We are happy to address any further review comments you have.

PGijsbers commented 1 year ago

At last, I found the time to have another look! I tried to run the provided example containerized in docker with the command:

python runbenchmark.py autogluonts:timeseries timeseries test -m docker

But unfortunately it currently fails on install. I first applied the fixes from #495 to make sure the regular setup still worked, which it did (python runbenchmark.py autogluon test test -m docker). I also added 'timeseries' as a valid framework tag to config.yaml to ensure the right version could be found, and cleared any local docker images of automlbenchmark/autogluon. Then, running the command fails with:

#25 102.2 ModuleNotFoundError: No module named 'autogluon.timeseries'

which I think originates from an earlier failure to install dependencies:

#25 30.99 ERROR: Could not find a version that satisfies the requirement catboost<0.25,>=0.23.0; extra == "all" (from autogluon-tabular[all]) (from versions: none)

I also tried fixing the version to an old AutoGluon which matches the period this review was requested (0.6.1), and ultimately get a similar error message:

#25 24.03 ModuleNotFoundError: No module named 'autogluon'

likely because of

#25 16.52 ERROR: No matching distribution found for ray[tune]<2.1,>=2.0; extra == "all"

The main underlying issue would be the outdated Python version (a concern already raised by Nick in #511), but in principle I don't understand why the old version wouldn't be able to install with simply 3.7-compatible versions. Full output here.

We will raise the Python versions, but I believe this should still work without it (at least for versions compatible with Py3.7).

Innixma commented 1 year ago

@PGijsbers Interesting, that is an odd error. I'd say we wouldn't be very concerned with supporting v0.6.x AG given that v0.7 is released, so my recommendation is first updating the python version and seeing if v0.7.x works. If not, I can try seeing about fixing.

Let me know if there is a particular reason v0.6.x support is important for AMLB here.

Looking at the specific error, this looks very strange because the catboost version that is being specified is very old. It almost looks like it is trying to install an old version of AutoGluon (older than v0.6.x), but I don't know why that would be the case.

Looking at our old releases, the version that matches that catboost version range requirement is AutoGluon 0.1.0:

https://github.com/autogluon/autogluon/blob/0.1.0/tabular/setup.py#L41

That is over 2 years old, so I don't know why it would be trying to install such an old version of AutoGluon...

PGijsbers commented 11 months ago

While not everything in this PR was in #564, it is my understanding that this PR is superseded and there will be other separate PRs for e.g., adding TS support for other frameworks. Closing this PR, just ping if that's not correct.

PGijsbers commented 11 months ago

Thank you all for your efforts and patience! And I am happy we finally got around to merging things with #564 🎉

openml / automlbenchmark

Metric fix, added GluonTS for Baselines, added AutoPytorchTS, added Local Code Option #507