winedarksea / AutoTS

Automated Time Series Forecasting
MIT License
1.12k stars 100 forks source link

Running out of RAM in 0.6.3 #210

Open emobs opened 11 months ago

emobs commented 11 months ago

Upgraded to 0.6.3 today and also updated all dependencies.

Using AutoTS 0.6.3 model.fit() runs fine like it did before on the same data set. Average memory usage is less than 4GB throughout model generations and validations, except for this model which tries to allocate 49GB of RAM and therefore crashes AutoTS:

{"model_number": 386, "model_name": "MultivariateMotif", "model_param_dict": {"window": 10, "point_method": "midhinge", "distance_metric": "hamming", "k": 20, "max_windows": 10000}, "model_transform_dict": {"fillna": "SeasonalityMotifImputerLinMix", "transformations": {"0": "bkfilter", "1": "AlignLastValue", "2": "AlignLastValue", "3": "bkfilter"}, "transformation_params": {"0": {}, "1": {"rows": 7, "lag": 1, "method": "additive", "strength": 1.0, "first_value_only": false}, "2": {"rows": 1, "lag": 28, "method": "additive", "strength": 1.0, "first_value_only": false}, "3": {}}}}

This never happened on many test runs on 0.6.2 with the same AutoTS model and dataset. Is this a bug or can I prevent this validation from causing a crash somehow?

Would it be possible to implement a precaution in a future version of AutoTS that skips validation methods or models when running out of RAM?

winedarksea commented 11 months ago

Thanks for the quick and specific identification. I have had RAM issues with MultivariateMotif in the past, but it is unchanged in 0.6.3, my solution instead was the BallTreeMultivariateMotif which you could try dropping in its place. However, I suspect the real issue is the SeasonalityMotifImputerLinMix which I changed to attempt to be more RAM friendly but apparently that's not the case.

Let me run it and see.

winedarksea commented 11 months ago

--- some running later --- Appears MultivariateMotif is indeed the issue. The ONLY change I made was it previously had Parallel(self.n_jobs - 1) and I switched it to the standard Parallel(self.n_jobs) so you should try doing 1 less to your n_jobs and seeing if that helps. It ultimate comes down to a memory issue, and amount of memory per worker. Reducing your n_jobs is the most expedient fix at the moment.

emobs commented 11 months ago

Thanks Colin, I tried running the fit again with n_jobs-1 and this time it didn't crash on MultivariateMotif model on Validation 1, so that's good. However, on the GLM model according to the last saved CurrentModel it does crash:

{"model_number": 323, "model_name": "GLM", "model_param_dict": {"family": "Gaussian", "constant": false, "regression_type": "datepart"}, "model_transform_dict": {"fillna": "SeasonalityMotifImputerLinMix", "transformations": {"0": "AlignLastValue", "1": "AnomalyRemoval", "2": "SeasonalDifference"}, "transformation_params": {"0": {"rows": 1, "lag": 1, "method": "additive", "strength": 1.0, "first_value_only": false}, "1": {"method": "zscore", "method_params": {"distribution": "chi2", "alpha": 0.1}, "fillna": "linear", "transform_dict": {"fillna": "rolling_mean_24", "transformations": {"0": "RegressionFilter"}, "transformation_params": {"0": {"sigma": 2, "rolling_window": 90, "run_order": "season_first", "regression_params": {"regression_model": {"model": "DecisionTree", "model_params": {"max_depth": 3, "min_samples_split": 0.05}}, "datepart_method": "simple_binarized", "polynomial_degree": null, "transform_dict": null, "holiday_countries_used": false}, "holiday_params": null}}}}, "2": {"lag_1": 7, "method": "Mean"}}}}

Also here, memory peaks certainly after being steady on approximately 10% usage level throughout the full run: First peak: image Crash: image (Screenshots taken from real-time Ubuntu Resources monitor)

Hope this helps to pinpoint the cause of this crash. Thanks for any reply in advance.

emobs commented 11 months ago

--- after another run --- I tested once again Colin, this time without a transformer_list parameter set in the model initiation (used to be 'superfast' on 0.6.2 and changed to 'scalable' on 0.6.3 as per your advise per email a little while ago. Without a value set for the transformer_list AutoTS doesn't crash. Also tried running with transformer_list 'superfast' (which I used on 0.6.2) again on 0.6.3, which didn't crash either. So it seems the issue is related to the new 'scalable' transformer_list in this case.

winedarksea commented 11 months ago

--- after another run --- I tested once again Colin, this time without a transformer_list parameter set in the model initiation (used to be 'superfast' on 0.6.2 and changed to 'scalable' on 0.6.3 as per your advise per email a little while ago. Without a value set for the transformer_list AutoTS doesn't crash. Also tried running with transformer_list 'superfast' (which I used on 0.6.2) again on 0.6.3, which didn't crash either. So it seems the issue is related to the new 'scalable' transformer_list in this case.

Scalable is a much larger group of transformers than superfast is, so should be more accurate. but it looks like I have more work to do still in chasing down parameter combinations that lead to too much memory. I suspect it is in the "AnomalyRemoval" --> "RegressionFilter" that is causing the problems

emobs commented 11 months ago

I probably can't help you with this, but if there's a way I can contribute, let me know!