Open hrzn opened 2 years ago
Hi @FBruzzesi, thanks for the proposal!
If you want to take a stab at it, you can checkout how parallelisation is done in gridsearch()
with num_jobs
.
For historical_forecasts()
, we could follow a similar approach. There are two iterators now; an "outer" one here over the time series and an "inner" one here iterating over the train/validation splits.
I think we could probably refactor this a bit in order to have the two for loops inside a generator, producing the right elements for each iteration step (e.g. yielding the right split index of the train/val sub-series). We could also extract the processing logic inside an inner function, which would be called from _parallel_apply()
, so something like:
results = _parallel_apply(iterator, _train_and_forecast, n_jobs, ...)
where the _train_and_forecast()
contains the logic to optionally re-train the model and obtain the forecast.
Hi @hrzn , I think the issue's stale if that's the case I'd love to pick it up. As for the code itself, I've been browsing it and retrain param definitely raises complexity. My current rough implementation overview for making both loops async:
Let me know if this makes sense, btw. with the university winter break ending and this issue being harder than what I'm used to it might take me a while (1-2 months) to put the code together (1-2 months)
@JanFidor we currently have some plans to refactor historical_forecasts()
for Torch models and RegressionModels, in order to make better use of batching and vectorization, which should bring drastic speedups, and would also allow more efficient use of multiple CPU cores. Now, there might still be some value in parallelizing cmputation for the other models (typically the "non-ML" models where things cannot really be vectorized easily). However my suggestion would be to wait that we first implement the vectorized approaches for ML models. That might bring quite a lot of changes to this function, and we will see more clearly afterwards what remains to be done.
Ping @dennisbader
@hrzn Sure thing! Changing anything in historical_forecasts()
just before a major refactor would definitely be a waste of time, I'll stay put for now and revisit the topic if there's an actual usecase for parallelisation after the refactor is merged
Hey folks - what's the latest on this?
Hey, any further plans for this one ?
The latest updates:
retrain=False
. metrics_kwargs
and "n_jobs".The only parallelization left to do (which I'm not even 100% sure it will work with all models) is for:
retrain != False
forecast_horizon > output_chunk_length
: this has to go into the dedicated optimized historical forecasts routine for regression models
Hey @hrzn, I would happy to give it a try.
Just to be clear, I imagine we can parallelise the iterate and forecast part, i.e. the cycle at line 472:
However, while this is almost straightforward when
retrain=False
, I wouldn't be sure how to approach it if a retrain is involved due to the fact that a certain trained model should predict only until the next retrain.