Closed jlopezpena closed 2 months ago
Hi @jlopezpena,
The training series in indeed stored in the self.training_series
attribute if the model is fitted on a single series. It simplifies the prediction step, during which Darts can assume that the user is trying to forecast n
values after the end of this training series and the user doesn't have to pass it again.
You can easily overwrite/remove it before saving your model with mode.training_series = None
. The only downside is that you will have to always provide an input series during inference. The covariates are also stored in self.past_covariate_series
and self.future_covariate_series
, you can remove them with the same approach.
We could indeed make it an argument/dedicated method to make it a bit simpler.
Thanks for your answer @madtoinou ! Providing a series for inference is fine, and it is actually the desired way of operation for a global forecasting model that has been trained in multiple series. Will test your suggestion and report back on the outcome!
Just FYI, if the model has been trained on multiple series, the training series/covariates are not saved :)
Yeah, I just realised that. Looks like it is not the training data, but multiple copies of the trained model getting stored for each of the prediction horizons, looks like I have about 30 copies of LGBMRegressor
, each of them taking about 35MB, stored in the model.model.estimators_
attribute. Probably not much can be done about that, unfortunately 😞
You can actually have an impact on this but the performance of the model is likely to decrease a little bit; there are 30 models, one for each step/position in the output_chunk_length
. If you set multi_models=False
when you create the model, only of one of them will be created and the lags will be shifted in the past for each position (see illustration).
There is a trade off between model size & performance sadly. Since you have such a long ocl
, you might be able to reduce a bit the size of the model by reducing the number of lags (input features) but again, it will probably negatively impact the model's forecasting capabilities.
Closing this issue, just realized it's a duplicate of #1836.
Is your feature request related to a current problem? Please describe.
Training a global forecasting model that uses
LightGBM
on a relatively large dataset, and then saving the resulting trained model, results in a massive file (over 1GB). The model itself is not complex enough to warrant this size, so I am guessingDarts
is storing the training dataset alongside the trained model. This is inconvenient for two reasons:Describe proposed solution There should be an option passed to the
model.save
method, or even better, a method (something likemodel.prune()
that would get rid of any and all data artifacts that are not required for inference. As this might break some existing functionality that could rely on the dataset being present, it would be acceptable to have this "pruned model" be a separate class with reduced functionality: basically, just the stuff needed for prediction no need for further training, backtesting, or anything like that. If a model is needed for those purposes, the full thing can still be stored, but a thin alternative for deployment would be very useful.