historical_forecasts consuming huge memory

Manohar0077 commented 5 days ago

Does the historical_forecasts function require a large amount of memory?

backtest_results = model.historical_forecasts(
    series=train_series,
    past_covariates=cov,
    forecast_horizon=4,
    num_samples=100,
    last_points_only=False,
    fit_kwargs={'val_series': val_series, 'val_past_covariates': cov},
    verbose=False,
    retrain=True)

When I run this program, it takes a significant amount of time, and the memory usage gradually increases until it eventually crashes. I’m running a global model on 100 time series ( a list of 100 time series) , each with an average length of 500 timesteps. I have 16GB of memory available. Are there any optimizations or settings that can be applied to reduce memory consumption for the historical_forecasts function?

madtoinou commented 5 days ago

Hi @Manohar0077,

Since you provide a list of series, historical_forecasts() iteratively creates a list containing all the forecasts. Depending on the length of the period forecasted, it can become considerably large (especially if start is not specified and the series are long). Since the series are processed individually (note that retrain=True, it will be retrained on each series independently i.e. there is no global training), you could split the historical forecasts in several parts to reduce the size of the output series that you could then concatenate.

Can you try to check if it still crashes when retrain=False? Just to see if it caused by the memory of the output series alone or also the memory required to train the model.

Manohar0077 commented 5 days ago

HI @madtoinou , thanks for the quick response!.

In normal training, when passing a list of time series, a single model is trained for all the series collectively. However, in backtesting with retrain=True, it trains the model individually for each time series. Why is that the case? Is there a way to perform backtesting in the same manner as global training, where the model is trained once for all series? How can we modify the backtest process to conduct global training internally?

madtoinou commented 5 days ago

Correct. There are several reasons but to name only a few:

historical_forecasts() is already taking take of a lot of things
it would induce additional constraints on the time indexes of the series (a series should be considered for retraining only if there are enough timestamps before the forecast horizon in its index, should it be forecasted if its start much later than the end of the current training period?), adding a considerable overhead at each iteration
"global" training is available only for global models, making the API more ambiguous for local models (unless an additional argument is added)

You can find some thoughts and observations about this feature in #1538; one of the suggested solution would be to distinguish the training from the validation series in historical_forecasts() but this is not a priority at the moment since retraining is often very time consuming when multiple series are used, making historical_forecasts() runtime unreasonably long if it occurs several times.

You can probably modify the source code in here, but I would recommend to be careful with the series time index to avoid data-leakage (slicing all the series at the step before the beginning the forecast horizon seems like the safest approach, but you would then need to handle the empty ones and the associated covariates). If you want to apply the historical forecasts to all the series, you will also need to swap the order of the loops and compute the "forecastable indexes extremities". A lot of corner-cases will come up.

unit8co / darts

historical_forecasts consuming huge memory #2540