unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.92k stars 860 forks source link

Ensemble Regression Implementation Issue #1566

Closed SireeshLimbu closed 1 year ago

SireeshLimbu commented 1 year ago

I am trying to implement an ensemble of 4 DARTS models each having fit and predict methods.

   #Initializing model and fitting

    model1 = XGBModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3],random_state = 42,output_chunk_length=1)
    model2 = LightGBMModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3],random_state = 42,output_chunk_length=1)
    model3 = CatBoostModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3], lags_future_covariates=None, output_chunk_length=1,random_state = 42)
    model4 = BlockRNNModel(input_chunk_length=12,model = 'GRU',dropout=0.2,random_state = 42,output_chunk_length = 1)
    forecasting_models = [model1,model2,model3,model4]

    regression_model  =RandomForest(lags=None, lags_past_covariates=None, lags_future_covariates=[0], output_chunk_length=1, add_encoders=None, n_estimators=100, max_depth=None)
    model_ensemble = RegressionEnsembleModel(forecasting_models = forecasting_models, regression_train_n_points =38,regression_model = regression_model)
    history = model_ensemble.fit(y_train_t,past_covariates = X_train_t)

    pred = model_ensemble.predict(n=1, series = y_train_t,past_covariates = X_train_t)

I get an error Unable to build any training samples of the target series and the corresponding covariate series; There is no time step for which all required lags are available and are not NaN values. I have checked thoroughly and I have no NaN values anywhere! I have also used a normal regression model for the final ensemble regression. This setup works for all the models in the ensemble individually.

dennisbader commented 1 year ago

Can you provide a fully minimal working example including some dummy time series to reproduce this issue? I tried and for me the above code runs fine.

SireeshLimbu commented 1 year ago

Hi Dennis! Yes sure thing! I cooked up something quick here. Tried to make it as similar to my situation as possible. I get the same error.

col1 = list(np.arange(1,40))
col2 = list(np.arange(41,80))
col3 = list(np.arange(81,120))
col4 = list(np.arange(121,160))

month = pd.date_range('2019-08-01','2022-10-01',freq='1MS').strftime("%Y-%b").tolist()

dt = {'col1': col1,'col2':col2,'col3':col3,'target':col4,'month':month}
df = pd.DataFrame(dt)
df['month'] = pd.to_datetime(df['month'])
df.index = df['month']
df = df.drop(['month'],axis = 1)

X_train_t = TimeSeries.from_dataframe(df,value_cols=list(df.columns.drop(["col3"])),freq="1MS")
y_train_t = TimeSeries.from_dataframe(df,freq = '1MS',value_cols = 'target')

#Initializing model and fitting
torch_metrics = SymmetricMeanAbsolutePercentageError()
model1 = XGBModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3],random_state = 42,output_chunk_length=1)
model2 = LightGBMModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3],random_state = 42,output_chunk_length=1)
model3 = CatBoostModel(lags= [-1,-2,-3], lags_past_covariates=[-1,-2,-3], lags_future_covariates=None, output_chunk_length=1,random_state = 42)
model4 = BlockRNNModel(input_chunk_length=12,model = 'GRU',dropout=0.2,random_state = 42,output_chunk_length = 1)
forecasting_models = [model1,model2,model3,model4]

regression_model  =RandomForest(lags=None, lags_past_covariates=None, lags_future_covariates=[0], output_chunk_length=1, add_encoders=None, n_estimators=100, max_depth=None)
model_ensemble = RegressionEnsembleModel(forecasting_models = forecasting_models, regression_train_n_points =38,regression_model = regression_model)
history = model_ensemble.fit(y_train_t,past_covariates = X_train_t)

pred = model_ensemble.predict(n=1, series = y_train_t,past_covariates = X_train_t)

I get the following error

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_27/3395707719.py in <module>
     12 regression_model  =RandomForest(lags=None, lags_past_covariates=None, lags_future_covariates=[0], output_chunk_length=1, add_encoders=None, n_estimators=100, max_depth=None)
     13 model_ensemble = RegressionEnsembleModel(forecasting_models = forecasting_models, regression_train_n_points =38,regression_model = regression_model)
---> 14 history = model_ensemble.fit(y_train_t,past_covariates = X_train_t)
     15 
     16 pred = model_ensemble.predict(n=1, series = y_train_t,past_covariates = X_train_t)

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/regression_ensemble_model.py in fit(self, series, past_covariates, future_covariates)
    119                 series=forecast_training,
    120                 past_covariates=past_covariates,
--> 121                 future_covariates=future_covariates,
    122             )
    123 

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/forecasting_model.py in _fit_wrapper(self, series, past_covariates, future_covariates)
   1820             future_covariates=future_covariates
   1821             if self.supports_future_covariates
-> 1822             else None,
   1823         )
   1824 

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/xgboost.py in fit(self, series, past_covariates, future_covariates, val_series, val_past_covariates, val_future_covariates, max_samples_per_ts, **kwargs)
    219             future_covariates=future_covariates,
    220             max_samples_per_ts=max_samples_per_ts,
--> 221             **kwargs,
    222         )
    223 

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/regression_model.py in fit(self, series, past_covariates, future_covariates, max_samples_per_ts, n_jobs_multioutput_wrapper, **kwargs)
    481 
    482         self._fit_model(
--> 483             series, past_covariates, future_covariates, max_samples_per_ts, **kwargs
    484         )
    485 

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/regression_model.py in _fit_model(self, target_series, past_covariates, future_covariates, max_samples_per_ts, **kwargs)
    363 
    364         training_samples, training_labels = self._create_lagged_data(
--> 365             target_series, past_covariates, future_covariates, max_samples_per_ts
    366         )
    367 

/opt/conda/lib/python3.7/site-packages/darts/models/forecasting/regression_model.py in _create_lagged_data(self, target_series, past_covariates, future_covariates, max_samples_per_ts)
    334             lags_future_covariates=lags_future_covariates,
    335             max_samples_per_ts=max_samples_per_ts,
--> 336             multi_models=self.multi_models,
    337         )
    338 

/opt/conda/lib/python3.7/site-packages/darts/utils/data/tabularization.py in _create_lagged_data(target_series, output_chunk_length, past_covariates, future_covariates, lags, lags_past_covariates, lags_future_covariates, max_samples_per_ts, is_training, multi_models)
    150             "Unable to build any training samples of the target series "
    151             + (f"at index {idx} " if len(target_series) > 1 else "")
--> 152             + "and the corresponding covariate series; "
    153             "There is no time step for which all required lags are available and are not NaN values.",
    154         )

/opt/conda/lib/python3.7/site-packages/darts/logging.py in raise_if(condition, message, logger)
    102         if `condition` is satisfied
    103     """
--> 104     raise_if_not(not condition, message, logger)
    105 
    106 

/opt/conda/lib/python3.7/site-packages/darts/logging.py in raise_if_not(condition, message, logger)
     76     if not condition:
     77         logger.error("ValueError: " + message)
---> 78         raise ValueError(message)
     79 
     80 

ValueError: Unable to build any training samples of the target series and the corresponding covariate series; There is no time step for which all required lags are available and are not NaN values.

My use case remains the same. A dataframe having around 10 columns including the 'target' column (the one to be forecasted). I turn it into a concatenated timeseries object using TimeSeries.from_dataframe. Then try to train the ensemble of all the four models with all the 39 months data to forecast one month ahead. i.e. to forecast target of '2022-11-01'. Thanks again!

SireeshLimbu commented 1 year ago

Accidentally closed. Please guide!

SireeshLimbu commented 1 year ago

It seems the whole training dataset (here 39 months) is divided between the list of models and the rf ensemble model. The error was being thrown because I entered the 'regression_train_n_points' : 38. Which means theres only (39 - 38) 1 training data point left for the list of ML models to train. Thereby not getting space for lags[-1,-2,-3].

Thus after dividing properly, by setting the 'regression_train_n_points' accordingly I stopped encountering the error!