[BUG] Neither NaiveEnsembleModel nor RegressionEnsembleModel seem to support num_samples > 1

merkoski commented 1 year ago

Describe the bug I have written software which individually uses either a CatBoost model, or a LightGBM model, each with quantile likelihoods. When I run historical_forecasts for either of these and provide num_samples > 1 (ie, 10) the results look fine and I get no errors and I do in fact see multiple samples generated per time. However, when I pass in both such models (untrained) as inputs to NaiveEnsembleModel or RegressionEnsembleModel and attempt to run historical_forecasts I get an error like:

darts.models.forecasting.forecasting_model ERROR: ValueError: num_samples > 1 is only supported for probabilistic models.

This comes from a call to historical_forecasts where model is the NaiveEnsembleModel or RegressionEnsembleModel (both seem to fail, although of course when I set num_samples to 1 it works, but that defeats the purpose of sampling):

backtest = model.historical_forecasts(series, start=0.5, forecast_horizon=5, stride=1, num_samples=10, retrain=True, verbose=True, past_covariates=my_past_covariates)

2022-11-26 20:19:53 darts.models.forecasting.forecasting_model ERROR: ValueError: num_samples > 1 is only supported for probabilistic models.

Expected behavior I would have expected NaiveEnsembleModel or RegressionEnsembleModel to support setting num_samples > 1 in historical_forecasts with underlying probabilistic models because the documentation indicates it is possible. IE: https://unit8co.github.io/darts/generated_api/darts.models.forecasting.baselines.html provides the following signature of NaiveEnsemble's historical_forecasts method:

historical_forecasts(series, past_covariates=None, future_covariates=None, num_samples=1, train_length=None, start=0.5, forecast_horizon=1, stride=1, retrain=True, overlap_end=False, last_points_only=True, verbose=False)

num_samples (int) – Number of times a prediction is sampled from a probabilistic model. Should be left set to 1 for deterministic models.

System (please complete the following information):

Python version: [e.g. 3.10]
darts version [e.g. 0.22.0]

Additional context

Stack trace:

2022-11-26 20:19:53 darts.models.forecasting.forecasting_model ERROR: ValueError: `num_samples > 1` is only supported for probabilistic models.

ValueError Traceback (most recent call last) Cell In [78], line 1 ----> 1 backtest = model.historical_forecasts(series, start=0.99, forecast_horizon=window, stride=1, num_samples=10, retrain=True, verbose=True, past_covariates=all_covariates) # forecast_horizon must match output_chunk_length 2 o_backtest = scaler.inverse_transform(backtest)

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/utils/utils.py:178, in _with_sanity_checks..decorator..sanitized_method(self, *args, kwargs) 175 only_args.pop("self") 177 getattr(self, sanity_check_method)(*only_args.values(), *only_kwargs) --> 178 return method_to_sanitize(self, only_args.values(), only_kwargs)

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/forecasting_model.py:500, in ForecastingModel.historical_forecasts(self, series, past_covariates, future_covariates, num_samples, train_length, start, forecast_horizon, stride, retrain, overlap_end, last_points_only, verbose) 481 if (not self._fit_called) or retrain_func( 482 counter=_counter, 483 pred_time=pred_time, (...) 492 else None, 493 ): 494 self._fit_wrapper( 495 series=train, 496 past_covariates=past_covariates, 497 future_covariates=future_covariates, 498 ) --> 500 forecast = self._predict_wrapper( 501 n=forecast_horizon, 502 series=train, 503 past_covariates=past_covariates, 504 future_covariates=future_covariates, 505 num_samples=num_samples, 506 ) 508 if last_points_only: 509 last_points_values.append(forecast.all_values(copy=False)[-1])

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/forecasting_model.py:1260, in GlobalForecastingModel._predict_wrapper(self, n, series, past_covariates, future_covariates, num_samples) 1252 def _predict_wrapper( 1253 self, 1254 n: int, (...) 1258 num_samples: int, 1259 ) -> TimeSeries: -> 1260 return self.predict( 1261 n, 1262 series, 1263 past_covariates=past_covariates, 1264 future_covariates=future_covariates, 1265 num_samples=num_samples, 1266 )

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/ensemble_model.py:153, in EnsembleModel.predict(self, n, series, past_covariates, future_covariates, num_samples) 144 def predict( 145 self, 146 n: int, (...) 150 num_samples: int = 1, 151 ) -> Union[TimeSeries, Sequence[TimeSeries]]: --> 153 super().predict( 154 n=n, 155 series=series, 156 past_covariates=past_covariates, 157 future_covariates=future_covariates, 158 num_samples=num_samples, 159 ) 161 predictions = self._make_multiple_predictions( 162 n=n, 163 series=series, (...) 166 num_samples=num_samples, 167 ) 168 return self.ensemble(predictions, series=series)

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/forecasting_model.py:1236, in GlobalForecastingModel.predict(self, n, series, past_covariates, future_covariates, num_samples) 1185 @abstractmethod 1186 def predict( 1187 self, (...) 1192 num_samples: int = 1, 1193 ) -> Union[TimeSeries, Sequence[TimeSeries]]: 1194 """Forecasts values for n time steps after the end of the series. 1195 1196 If :func:fit() has been called with only one TimeSeries as argument, then the series argument of (...) 1234 a sequence where each element contains the corresponding n points forecasts. 1235 """ -> 1236 super().predict(n, num_samples) 1237 if self._expect_past_covariates and past_covariates is None: 1238 raise_log( 1239 ValueError( 1240 "The model has been trained with past covariates. Some matching past_covariates " 1241 "have to be provided to predict()." 1242 ) 1243 )

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/forecasting_model.py:211, in ForecastingModel.predict(self, n, num_samples) 201 raise_log( 202 ValueError( 203 "The model must be fit before calling predict(). " (...) 207 logger, 208 ) 210 if not self._is_probabilistic() and num_samples > 1: --> 211 raise_log( 212 ValueError( 213 "num_samples > 1 is only supported for probabilistic models." 214 ), 215 logger, 216 )

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/logging.py:129, in raise_log(exception, logger) 126 message = str(exception) 127 logger.error(exception_type + ": " + message) --> 129 raise exception

ValueError: num_samples > 1 is only supported for probabilistic models.

madtoinou commented 1 year ago

Hi!

It appears that you were indeed declaring your CatBoost and LightGBM models with the quantile likelihood to make them probabilistic.

In order to make NaiveEnsembleModel or RegressionEnsembleModel probabilistic, you have to use the likelihood argument in the models contained in them. By default, it is set to None and the models won't be probabilistic i.e. unable to generate several samples using the num_samples (hence the error message).

Let me know if this solve your problem problem.

merkoski commented 1 year ago

Thanks for responding @madtoinou -- like I mentioned, I do instantiate CatBoost and LightGBM with liklelihood parameters, and they individually work fine. It's when I use them in an ensemble that probability-based results fail. For example:

likelihood = 'quantile' model1 = LightGBMModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags, random_state=42, likelihood=likelihood) model2 = CatBoostModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags, iterations=iterations, random_state=42, likelihood=likelihood)

I can use each of these individually and see probabalistic results. However when I use an ensemble against the untrained models it fails during prediction or historical_forecasts.

models = [model1, model2] models = [m.untrained_model() for m in models] model = NaiveEnsembleModel(models) ... Train, scale, etc backtest = model.historical_forecasts(series, start=0.5, forecast_horizon=5, stride=1, num_samples=10, retrain=True, verbose=True, past_covariates=my_past_covariates) ... Fails per the above error message in this ticket about num_samples

I also tried to add a likelihood parameter to NaiveEnsembleModel but that fails too.

model = NaiveEnsembleModel(models, likelihood='quantile')

File ~/Documents/lib/python3.10/site-packages/darts-0.22.0-py3.10.egg/darts/models/forecasting/forecasting_model.py:97, in ModelMeta.call(cls, *args, kwargs) 94 cls._model_call = all_params 96 # 6) call model ---> 97 return super().call(all_params)

TypeError: NaiveEnsembleModel.init() got an unexpected keyword argument 'likelihood'

I hope you can help!

hrzn commented 1 year ago

Hi @merkoski , could you paste a minimal reproducible code snippet? I'd be interested in trying to reproduce your issue directly.

TheNumbersAI commented 1 year ago

Here you go @hrzn this code shows the failure. If you set num_samples to 1 and remove the likelihood='quantile' setting from each model, then it works. This fails with either NaiveEnsembleModel or RegressionEnsembleModel

===

from darts.models import LinearRegressionModel, CatBoostModel, RegressionEnsembleModel, NaiveEnsembleModel from darts.utils.timeseries_generation import linear_timeseries from darts.dataprocessing.transformers import Scaler

scaler = Scaler() series = linear_timeseries(length=100)

ts_train, ts_val = series[:50], series[50:] train_transformed = scaler.fit_transform(ts_train)

past_covariates = series

window = 5 lags = [-1, -2, -5] num_samples = 100

my_model1 = LinearRegressionModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags, likelihood='quantile') my_model2 = CatBoostModel(lags=lags, output_chunk_length=window, lags_past_covariates=lags, likelihood='quantile')

my_ensemble_model = NaiveEnsembleModel([my_model1, my_model2]) backtest = my_ensemble_model.historical_forecasts( series, start=0.5, last_points_only=True, forecast_horizon=window, stride=1, verbose=True, past_covariates=past_covariates, num_samples=num_samples )

hrzn commented 1 year ago

Thanks for reporting! That's probably a bug in these ensemble models, which don't recognize they can be probabilistic when composed of probabilistic models.

eliane-maalouf commented 1 year ago

Hello @Jason-Merkoski, as per @hrzn previous input, a bug fix for this issue was merged and we closed this issue. Hope you will be able to test again and let us know if it worked for you.

unit8co / darts