microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.76k stars 495 forks source link

Time-series bug when running tutorials on FLAML==2.1.0 #1225

Open LeonardoEssence opened 9 months ago

LeonardoEssence commented 9 months ago

Discussed in https://github.com/microsoft/FLAML/discussions/1224

Originally posted by **LeonardoEssence** September 21, 2023 I was running some of the AutoML examples on the documentation [here](https://microsoft.github.io/FLAML/docs/Examples/AutoML-Time%20series%20forecast#univariate-time-series), and the code for all time series examples kept breaking at a pandas `key error` prompt. See below: ` Traceback (most recent call last): File "/mnt/uni_variate_time_series_flaml.py", line 30, in automl.fit(dataframe=train_df, # training data File "/opt/conda/lib/python3.9/site-packages/flaml/automl/automl.py", line 1663, in fit task.validate_data( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/task/time_series_task.py", line 167, in validate_data data = TimeSeriesDataset( File "/opt/conda/lib/python3.9/site-packages/flaml/automl/time_series/ts_data.py", line 57, in __init__ self.frequency = pd.infer_freq(train_data[time_col].unique()) File "/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__ indexer = self.columns.get_loc(key) File "/opt/conda/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'index' ` I went deep into the code and found what I believe is a small bug in the class `TimeSeriesTask`, when calling the function `TimeSeriesDataset` in line 167 in the file **time_series_task.py**. The function is expecting a data frame with train data **and** the time stamp vector, however, the code in line 165, is only concatenating `Xt` and `yt`, leaving out the time vector. I propose to change line 165 from `df_t = pd.concat([Xt, yt], axis=1)` to `df_t = pd.concat([pre_data.all_data[pre_data.time_col], Xt, yt], axis=1)`. That worked for me, however, I'm not 100% sure that's the intended functionality but as it is now, it is not working. Is anybody finding the same? or can provide some suggestions?