Closed shchur closed 1 year ago
Thanks for updating the PR, sorry for our unresponsiveness on the old one. We currently have an open fedot integration PR and according to their readme they also do "time series prediction problems". Would you be willing to talk with @nicl-nno and see if the two of you can add time series support for the FEDOT integration as well? That would be a good test as to whether the abstraction levels/supported features are sufficient to generalize this beyond just AutoGluonTS.
I am planning to have a closer look at the FEDOT PR tomorrow, but based on a first impression I expect to be able to merge it this week.
Thanks! Sure, I will be happy to help with integrating the time series support for FEDOT.
The current PR does not really introduce new functionality and mostly refactors / fixes the existing code (to make it runnable), so I think it would still be useful, unless we plan to completely refactor all time series code in AMLB.
In my opinion, it would make sense to work on the FEDOT time series PR after #563 and this PR are merged. Current abstractions are readily compatible with other libraries like StatsForecast, GluonTS, AutoPyTorch, so I don't expect there to be a lot of work needed for FEDOT. But if any changes to the core TS logic will be necessary for FEDOT, I'm happy to take care of them. Does that sound reasonable to you?
I agree with @shchur that is is better to separate the TS support for FEDOT to the new PR.
In case it's helpful: we have small self-developed benchmark for time series forecasting named pytsbe. It already contains integration with several AutoML solutions (FEDOT, LAMA, AutoTS, AutoGluon) and baselines. Maybe some code snippets will be useful for AMLB development.
@PGijsbers I have just double-checked, the code works locally, on Docker, and on AWS, so the PR is ready for review.
I agree with @shchur that is is better to separate the TS support for FEDOT to the new PR.
Sure, that sounds fine. I'll have a look at both this week (hopefully starting tomorrow). Thanks!
Thank you for the review and sorry for missing the relevant discussion on the previous PR!
This PR only contains minimal changes necessary to make the code runnable & correct the metric definitions. I have not made any changes to the dataset schema - but I think that the current schema is not ideal and could be substantially improved. I would like to incorporate these changes before updating the documentation. Below is a short overview of the current time series dataset schema and a possible modification.
If these changes look good to you, I will incorporate them to the PR & document them in HOWTO.md
.
At the bare minimum, each time series task must be defined by
forecast_horizon_in_steps
- a positive integer that specifies how many steps into the future need to be forecast for each time series (= number of future time series values that need to be predicted)two CSV files
train.csv
with 3 columns (item_id
- unique ID of each time series, timestamp
of the observation, target
time series value), with contents such asitem_id | timestamp | target |
---|---|---|
A | 2020-01-01 | 2.0 |
A | 2020-01-02 | 4.0 |
B | 2022-05-01 | 1.0 |
B | 2020-05-02 | 3.0 |
B | 2020-05-03 | 1.0 |
test.csv
with contents such asitem_id | timestamp | target |
---|---|---|
A | 2020-01-01 | 2.0 |
A | 2020-01-02 | 4.0 |
A | 2020-01-03 | 5.0 |
B | 2022-05-01 | 1.0 |
B | 2020-05-02 | 3.0 |
B | 2020-05-03 | 1.0 |
B | 2020-05-04 | 0.0 |
(in this example the dataset contains 2 time series ['A', 'B']
and forecast_horizon_in_steps = 1
)
The train.csv
and test.csv
files must satisfy the following conditions
timestamp
s for each individual time series are regularly sampled (e.g., daily, monthly, 30 min, frequency).train.csv
, test.csv
contains all past values + exactly forecast_horizon_in_steps
future values corresponding to the forecast horizon.This dataset schema has a major limitation: the forecast_horizon_in_steps
is "baked in" into the train.csv
/test.csv
files. If we wanted to evaluate on the same dataset with a different value for forecast_horizon_in_steps
, we would need to re-generate the train.csv
file.
To address this limitation, I propose the following modification to the dataset schema.
We only store a single file test.csv
that is identical to the test file in the current schema.
The train set (called train.csv
before) is generated automatically by AMLB by removing the last forecast_horizon_in_steps
values from each time series in the test set.
If the specified forecast_horizon_in_steps
is >= the length of the shortest time series in test.csv
, an exception is raised.
This also makes it easy to evaluate over multiple folds for time series data: for each fold i
in [0, 1, 2, 3, ...]
, we define
i * forecast_horizon_in_steps
from each original time series in test.csv
(i+1) * forecast_horizon_in_steps
from each original time series in test.csv
Please let me know if this design looks reasonable to you.
This also makes it easy to evaluate over multiple folds for time series data: for each fold i in [0, 1, 2, 3, ...], we define
- the test set by removing i * forecast_horizon_in_steps from each original time series in test.csv
- the train set by removing (i+1) * forecast_horizon_in_steps from each original time series in test.csv
I think that all makes sense, but it seems the description is off to me. Given forecast_horizon_in_steps=K
then you want to have:
rows | fold 0 | fold 1 | fold 2 | ... | fold n |
---|---|---|---|---|---|
1..K | train | train | train | ... | train |
K..2K | test | train | train | ... | train |
2..3K | test | train | ... | train | |
3..4K | test | ... | train | ||
... | |||||
(n-1)K...nK | ... | test |
Right? The way I read your description the test
set actually grows by forecast_horizon_in_steps
each fold. And I image it is useful to specify a minimum train set greater than forecast_horizon_in_steps
(e.g., training data is one year, and the folds consist of subsequent days (e.g., a forecast horizon of 3 days).
It seems that we are describing the same thing, but I was numbering the folds in reverse order š
rows | fold 0 | fold 1 | ... | fold (n-1) | fold n |
---|---|---|---|---|---|
1..K | train | train | ... | train | train |
K..2K | train | train | ... | train | test |
2..3K | trainĀ | test | ... | test | |
... | Ā | Ā | Ā Ā | Ā | |
(n-2)K...(n-1)K | trainĀ | Ā test | |||
(n-1)K...nK | testĀ | Ā |
This means, fold 0 aways corresponds to evaluation on the data with the most recent K
timestamps for each time series (which recovers the setting used in most forecasting competitions).
This way if the user decides to evaluate on folds [0, 1, 2, ..., n-1]
, then the n
folds with the largest amount of training data will be used.
the test set by removing i * forecast_horizon_in_steps from each original time series in test.csv
Threw me off since if i=0
then there is nothing removed? But you always want at least forecast_horizon_in_steps
removed. But I guess it's just off-by-one error :) Starting with the full(est) data at fold 0 (as per your table) makes sense to me š
@PGijsbers I have moved all time series related logic to a new class TimeSeriesDataset
that inherits from FileDataset
and added instructions on how to add new time series tasks to HOWTO.md
. Please let me know if this looks good you.
We are currently experiencing some issues with the OpenML server. However, I can confirm that tests pass locally (except for three which explicitly don't use a cache). I'll wait a little longer to see if OpenML issues resolve, but if they do not I'll likely merge tomorrow (and if they do, then I'll run the tests and assuming all passes, merge it too :)).
Thank you @PGijsbers, I appreciate your help and support with this PR!
Merged! Thank you @shchur (and also the people of the original PR) for being patient with us :)
Just a heads up, I will be out of office in August so I may not respond in that period :)
Thanks @PGijsbers for all the help provided in the reviews!! Excited to have this merged in.
This PR contains several improvements to time series forecasting functionality in AMLB and supersedes #507.
autogluon
with the same installation procedure. The correct predictor object is used depending on theDatasetType
.quantile_levels
can be defined as a part of task definition, instead of always setting this argument to[0.1, 0.2, ..., 0.8, 0.9]
.predictions
column when computing point forecast metricsResult
object in AMLB does not have access to historic information, so we store this extra information asoptional_columns
in the predictions dataframe.Explain how to define a time series forecasting task in
HOWTO.md
.Work left for future PRs