unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.87k stars 851 forks source link

[BUG] multi_models=FALSE not working for XGBOOST #2186

Closed suswamin closed 6 months ago

suswamin commented 7 months ago

Describe the bug Predict covariates does not work for XGBOOST/ CATBOOST/LIGHTGBM

To Reproduce Attached is the excel file that has sample data - I did spend time going through darts.utils.timeseries_generation t generate dummy data - was not very successful - so attaching the sample excel file and code snippet

data = pd.read_csv('Book1.csv')
forecast_xgboost=pd.DataFrame()
train =data.iloc[:len(data)-8]      
predict=data.iloc[len(data)-8:len(data)]   
series = darts.TimeSeries.from_series(train['y'])
timeseries_flag_past=darts.TimeSeries.from_series(train['weekday'])
timeseries_flag_future=darts.TimeSeries.from_series(predict['weekday'])      
#works fine multi-models True 
model_XGB = XGBModel(lags_future_covariates=[0],output_chunk_length=7,multi_models =True)
XGB=model_XGB.fit(series,future_covariates=timeseries_flag_past)
pred_xgb = XGB.predict(n=7,future_covariates=timeseries_flag_future)
# do not work : multi - models = False
model_XGB = XGBModel(lags_future_covariates=[0],output_chunk_length=7,multi_models =False)
XGB=model_XGB.fit(series,future_covariates=timeseries_flag_past)
pred_xgb = XGB.predict(n=7,future_covariates=timeseries_flag_future)

Expected behavior In the predict function XGBOOST does not take the future covariates and throws error- The corresponding future_covariate of the series at index 0 isn't sufficiently long. Given horizon n=7, min(lags_future_covariates)=0, max(lags_future_covariates)=0 and output_chunk_length=7, the future_covariate has to range from 95 until 101 (inclusive), but it ranges only from 101 until 108.

System (please complete the following information): Python Version 3.11.5 darts version 0.25.0 Book1.csv Book1.csv Additional context Nbeats - future_covariates Nbeats API reference does not specify the covariates- still in the fit function it takes the covariates but fails in predict function NBEATS_14=model_nbeats_14.fit(series,past_covariates=timeseries_flag_past) pred_nbeats_14 = NBEATS_14.predict(n=7,past_covariates=timeseries_flag_future)

madtoinou commented 6 months ago

Hi @suswamin,

Sorry for the delay, I did not realize that this issue went without answer for so long.

When a model is created with multi_models=False, only one set of coefficients is fitted for all the forecasted positions. Since using the same features with one set of coefficient would yield the exact same output for all the position in output_chunk_length, the lags are actually shifted in the past by output_chunk_length - 1.

You can visualize how the lags (target, past_covariates and future_covariates) are generated in the regression example notebook.

This means that to forecast the 7 values after the end of the training series, the length requirements when multi_models=False/True are different.

validation_start = len(data)-8
shift = 7 - 1
forecast_validation = data.iloc[validation_start - shift:validation_start + 1]['weekday']
# using model created with `multi_models=False`
pred_xgb = XGB.predict(n=7, future_covariates=darts.TimeSeries.from_series(forecast_validation))