Open ANNIKADAHLMANN-8451 opened 3 months ago
UPDATE: I am successfully able to run the notebook with Spark version 3.1.2, but get this error on newer version of Spark in my Databricks cluster. Is there any documentation or support on running Autogluon newer Spark versions?
I'm running into an
AttributeError
when trying to fit an Autogluon time series model in Databricks. I'm running into the error specifically when calling.fit()
on theTimeSeriesPredictor
instance. I made no changes to the code upon cloning the repo into Databricks.Databricks cluster configuration
Results from running the following command
from autogluon.core.utils import show_versions
show_versions()
Here's the output from the run: Beginning AutoGluon training... Time limit = 800s AutoGluon will save models to 'model' =================== System Info =================== AutoGluon Version: 1.1.1 Python Version: 3.10.12 Operating System: Linux Platform Machine: x86_64 Platform Version: #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 CPU Count: 8 GPU Count: 0 Memory Avail: 39.17 GB / 57.39 GB (68.2%) Disk Space Avail: 208.05 GB / 250.92 GB (82.9%)
Setting presets to: medium_quality
Fitting with arguments: {'enable_ensemble': True, 'eval_metric': SMAPE, 'freq': 'D', 'hyperparameters': 'light', 'known_covariates_names': [], 'num_val_windows': 1, 'prediction_length': 30, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'random_seed': 123, 'refit_every_n_windows': 1, 'refit_full': False, 'skip_model_selection': False, 'target': 'target', 'time_limit': 800, 'verbosity': 2}
And here's the error message: File, line 1
----> 1 sv_predictor.fit(
2 train_data,
3 time_limit=800,
4 presets="medium_quality"
5 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/core/utils/decorators.py:31, in unpack.._unpack_inner.._call(*args, kwargs)
28 @functools.wraps(f)
29 def _call(*args, kwargs):
30 gargs, gkwargs = g(other_args, args, kwargs)
---> 31 return f(*gargs, gkwargs)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/timeseries/predictor.py:701, in TimeSeriesPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, hyperparameter_tune_kwargs, excluded_model_types, num_val_windows, val_step_size, refit_every_n_windows, refit_full, enable_ensemble, skip_model_selection, random_seed, verbosity) 698 logger.info("\nFitting with arguments:") 699 logger.info(f"{pprint.pformat({k: v for k, v in fit_args.items() if v is not None})}\n") --> 701 train_data = self._check_and_prepare_data_frame(train_data, name="train_data") 702 logger.info(f"Provided train_data has {self._get_dataset_stats(train_data)}") 704 if val_step_size is None:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/timeseries/predictor.py:314, in TimeSeriesPredictor._check_and_prepare_data_frame(self, data, name) 312 logger.info(f"Inferred time series frequency: '{df.freq}'") 313 else: --> 314 if df.freq != self.freq: 315 logger.warning(f"{name} with frequency '{df.freq}' has been resampled to frequency '{self.freq}'.") 316 df = df.convert_frequency(freq=self.freq)
File /databricks/python/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.getattr(self, name) 5568 if ( 5569 name not in self._internal_names_set 5570 and name not in self._metadata 5571 and name not in self._accessors 5572 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5573 ): 5574 return self[name] -> 5575 return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'freq'