unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.95k stars 865 forks source link

historical_forecasts consuming huge memory #2540

Open Manohar0077 opened 5 days ago

Manohar0077 commented 5 days ago

Does the historical_forecasts function require a large amount of memory?

backtest_results = model.historical_forecasts(
    series=train_series,
    past_covariates=cov,
    forecast_horizon=4,
    num_samples=100,
    last_points_only=False,
    fit_kwargs={'val_series': val_series, 'val_past_covariates': cov},
    verbose=False,
    retrain=True)

When I run this program, it takes a significant amount of time, and the memory usage gradually increases until it eventually crashes. I’m running a global model on 100 time series ( a list of 100 time series) , each with an average length of 500 timesteps. I have 16GB of memory available. Are there any optimizations or settings that can be applied to reduce memory consumption for the historical_forecasts function?

madtoinou commented 5 days ago

Hi @Manohar0077,

Since you provide a list of series, historical_forecasts() iteratively creates a list containing all the forecasts. Depending on the length of the period forecasted, it can become considerably large (especially if start is not specified and the series are long). Since the series are processed individually (note that retrain=True, it will be retrained on each series independently i.e. there is no global training), you could split the historical forecasts in several parts to reduce the size of the output series that you could then concatenate.

Can you try to check if it still crashes when retrain=False? Just to see if it caused by the memory of the output series alone or also the memory required to train the model.

Manohar0077 commented 5 days ago

HI @madtoinou , thanks for the quick response!.

In normal training, when passing a list of time series, a single model is trained for all the series collectively. However, in backtesting with retrain=True, it trains the model individually for each time series. Why is that the case? Is there a way to perform backtesting in the same manner as global training, where the model is trained once for all series? How can we modify the backtest process to conduct global training internally?

madtoinou commented 5 days ago

Correct. There are several reasons but to name only a few:

You can find some thoughts and observations about this feature in #1538; one of the suggested solution would be to distinguish the training from the validation series in historical_forecasts() but this is not a priority at the moment since retraining is often very time consuming when multiple series are used, making historical_forecasts() runtime unreasonably long if it occurs several times.

You can probably modify the source code in here, but I would recommend to be careful with the series time index to avoid data-leakage (slicing all the series at the step before the beginning the forecast horizon seems like the safest approach, but you would then need to handle the empty ones and the associated covariates). If you want to apply the historical forecasts to all the series, you will also need to swap the order of the loops and compute the "forecastable indexes extremities". A lot of corner-cases will come up.