winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.05k stars 96 forks source link

Running out of RAM in 0.6.3 #210

Open emobs opened 7 months ago

emobs commented 7 months ago

Upgraded to 0.6.3 today and also updated all dependencies.

Using AutoTS 0.6.3 model.fit() runs fine like it did before on the same data set. Average memory usage is less than 4GB throughout model generations and validations, except for this model which tries to allocate 49GB of RAM and therefore crashes AutoTS:

{"model_number": 386, "model_name": "MultivariateMotif", "model_param_dict": {"window": 10, "point_method": "midhinge", "distance_metric": "hamming", "k": 20, "max_windows": 10000}, "model_transform_dict": {"fillna": "SeasonalityMotifImputerLinMix", "transformations": {"0": "bkfilter", "1": "AlignLastValue", "2": "AlignLastValue", "3": "bkfilter"}, "transformation_params": {"0": {}, "1": {"rows": 7, "lag": 1, "method": "additive", "strength": 1.0, "first_value_only": false}, "2": {"rows": 1, "lag": 28, "method": "additive", "strength": 1.0, "first_value_only": false}, "3": {}}}}

This never happened on many test runs on 0.6.2 with the same AutoTS model and dataset. Is this a bug or can I prevent this validation from causing a crash somehow?

Would it be possible to implement a precaution in a future version of AutoTS that skips validation methods or models when running out of RAM?

winedarksea commented 7 months ago

Thanks for the quick and specific identification. I have had RAM issues with MultivariateMotif in the past, but it is unchanged in 0.6.3, my solution instead was the BallTreeMultivariateMotif which you could try dropping in its place. However, I suspect the real issue is the SeasonalityMotifImputerLinMix which I changed to attempt to be more RAM friendly but apparently that's not the case.

Let me run it and see.

winedarksea commented 7 months ago

--- some running later --- Appears MultivariateMotif is indeed the issue. The ONLY change I made was it previously had Parallel(self.n_jobs - 1) and I switched it to the standard Parallel(self.n_jobs) so you should try doing 1 less to your n_jobs and seeing if that helps. It ultimate comes down to a memory issue, and amount of memory per worker. Reducing your n_jobs is the most expedient fix at the moment.

emobs commented 7 months ago

Thanks Colin, I tried running the fit again with n_jobs-1 and this time it didn't crash on MultivariateMotif model on Validation 1, so that's good. However, on the GLM model according to the last saved CurrentModel it does crash:

{"model_number": 323, "model_name": "GLM", "model_param_dict": {"family": "Gaussian", "constant": false, "regression_type": "datepart"}, "model_transform_dict": {"fillna": "SeasonalityMotifImputerLinMix", "transformations": {"0": "AlignLastValue", "1": "AnomalyRemoval", "2": "SeasonalDifference"}, "transformation_params": {"0": {"rows": 1, "lag": 1, "method": "additive", "strength": 1.0, "first_value_only": false}, "1": {"method": "zscore", "method_params": {"distribution": "chi2", "alpha": 0.1}, "fillna": "linear", "transform_dict": {"fillna": "rolling_mean_24", "transformations": {"0": "RegressionFilter"}, "transformation_params": {"0": {"sigma": 2, "rolling_window": 90, "run_order": "season_first", "regression_params": {"regression_model": {"model": "DecisionTree", "model_params": {"max_depth": 3, "min_samples_split": 0.05}}, "datepart_method": "simple_binarized", "polynomial_degree": null, "transform_dict": null, "holiday_countries_used": false}, "holiday_params": null}}}}, "2": {"lag_1": 7, "method": "Mean"}}}}

Also here, memory peaks certainly after being steady on approximately 10% usage level throughout the full run: First peak: image Crash: image (Screenshots taken from real-time Ubuntu Resources monitor)

Hope this helps to pinpoint the cause of this crash. Thanks for any reply in advance.

emobs commented 7 months ago

--- after another run --- I tested once again Colin, this time without a transformer_list parameter set in the model initiation (used to be 'superfast' on 0.6.2 and changed to 'scalable' on 0.6.3 as per your advise per email a little while ago. Without a value set for the transformer_list AutoTS doesn't crash. Also tried running with transformer_list 'superfast' (which I used on 0.6.2) again on 0.6.3, which didn't crash either. So it seems the issue is related to the new 'scalable' transformer_list in this case.

winedarksea commented 7 months ago

--- after another run --- I tested once again Colin, this time without a transformer_list parameter set in the model initiation (used to be 'superfast' on 0.6.2 and changed to 'scalable' on 0.6.3 as per your advise per email a little while ago. Without a value set for the transformer_list AutoTS doesn't crash. Also tried running with transformer_list 'superfast' (which I used on 0.6.2) again on 0.6.3, which didn't crash either. So it seems the issue is related to the new 'scalable' transformer_list in this case.

Scalable is a much larger group of transformers than superfast is, so should be more accurate. but it looks like I have more work to do still in chasing down parameter combinations that lead to too much memory. I suspect it is in the "AnomalyRemoval" --> "RegressionFilter" that is causing the problems

emobs commented 7 months ago

I probably can't help you with this, but if there's a way I can contribute, let me know!