microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.76k stars 495 forks source link

ts_forecast - KeyError/ValueError in predict() #1229

Open tkeyo opened 9 months ago

tkeyo commented 9 months ago

Hi,

I use FLAML==2.1.0. I tried the timeseries forecasting tutorials which worked. When I use my own data I run into KeyError/ValueError issues when I try to predict() on the test data.

My assumption is that there are features created during the fitting process. The model is trained with these features but the predict() method does not preprocesses the test data and expects the already feature engineered dataframe.

I think this issue is related to this one ? Prediction problem for component models while using ensembling

Is there are workaround - Ex: Extract the transformer to preprocess the data to the expected format?

Input data Contains only 2 columns

Fit code

from flaml import AutoML

automl = AutoML()

settings = {
    "time_budget": 5,
    "metric": "mae",
    "task": "ts_forecast",
    "eval_method": "holdout",
    "seed": 42,
}

automl.fit(
    dataframe=train_df,
    label="y",
    period=time_horizon,
    estimator_list=[
        # "lgbm",
        # "rf",
        # "xgboost",
        # "extra_tree",
        # "xgb_limitdepth",
        "prophet",
        "arima",
        "sarimax",
    ],
    ** settings,
)

Predict Code

''' compute predictions of testing dataset '''
y_pred = automl.predict(X_test)
print("Predicted labels", y_pred)
print("True labels", y_test)

Output - prophet

ValueError: Regressor 'index_hour_cos' missing from dataframe

Output - arima

KeyError: "['index_quarter_sin', 'index_month_cos', 'index_sin1', 'index_quarter_cos', 'index_sin2', 'index_month_sin', 'index_cos1', 'index_dayofyear_sin', 'index_cos4', 'index_cos2', 'index_cos3', 'index_sin4', 'index_sin3', 'index_dayofyear_cos'] not in index"

Output - sarimax

KeyError: "['index_hour_cos', 'index_minute_sin', 'index_sin2', 'index_cos1', 'index_hour_sin', 'index_dayofyear_sin', 'index_cos2', 'index_sin3', 'index_dayofyear_cos', 'index_cos3', 'index_dayofweek_sin', 'index_quarter_sin', 'index_month_cos', 'index_second_cos', 'index_sin1', 'index_quarter_cos', 'index_month_sin', 'index_second_sin', 'index_cos4', 'index_minute_cos', 'index_sin4', 'index_dayofweek_cos'] not in index"
jccamargo94 commented 6 months ago

Hi. Is this getting any work? I'm using flaml==2.1.1 and I'm facing the same problem! I may try to contribute in this issue in some free time at January

jccamargo94 commented 5 months ago

Hi. Is this getting any work? I'm using flaml==2.1.1 and I'm facing the same problem! I may try to contribute in this issue in some free time at January

This seems to be a problem with the datetime column as index in the training dataset. I move the index into a column and fit/predict and (de)serialization works as expected.

EgorKraevTransferwise commented 5 months ago

Apologies for taking a while to reply. Can you please also provide code for creating an example dataset of the kind that causes the error (X* and y*), so I can reproduce locally? In general, FLAML TS expects the time to be provided as a column in the input dataframe (time_col argument to fit(), defaulting to the first column of the dataframe). Are you saying if you do this and then the index is also time-valued, the error arises?