mpolinowski / automl-gluon-tabular-data

AutoML with AutoGluon
https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/AIOps/2023-06-18-automl-with-autogluon-tabular-data/2023-06-18
0 stars 0 forks source link

AttributeError when running in Databricks #1

Open ANNIKADAHLMANN-8451 opened 3 months ago

ANNIKADAHLMANN-8451 commented 3 months ago

I'm running into an AttributeError when trying to fit an Autogluon time series model in Databricks. I'm running into the error specifically when calling .fit() on the TimeSeriesPredictor instance. I made no changes to the code upon cloning the repo into Databricks.

Databricks cluster configuration

Results from running the following command from autogluon.core.utils import show_versions show_versions()

INSTALLED VERSIONS
------------------
date                   : 2024-07-24
time                   : 16:46:40.799450
python                 : 3.10.12.final.0
OS                     : Linux
OS-release             : 5.15.0-1067-azure
Version                : #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024
machine                : x86_64
processor              : x86_64
num_cores              : 8
cpu_ram_mb             : 58770.0
cuda version           : None
num_gpus               : 0
gpu_ram_mb             : []
avail_disk_size_mb     : 213044

accelerate             : 0.21.0
autogluon              : 1.1.1
autogluon.common       : 1.1.1
autogluon.core         : 1.1.1
autogluon.features     : 1.1.1
autogluon.multimodal   : 1.1.1
autogluon.tabular      : 1.1.1
autogluon.timeseries   : 1.1.1
boto3                  : 1.24.28
catboost               : 1.2.5
defusedxml             : 0.7.1
evaluate               : 0.4.2
fastai                 : 2.7.15
gluonts                : 0.15.1
hyperopt               : 0.2.7
imodels                : None
jinja2                 : 3.1.4
joblib                 : 1.2.0
jsonschema             : 4.21.1
lightgbm               : 4.3.0
lightning              : 2.3.3
matplotlib             : 3.5.2
mlforecast             : 0.10.0
networkx               : 3.3
nlpaug                 : 1.1.11
nltk                   : 3.8.1
nptyping               : 2.4.1
numpy                  : 1.24.4
nvidia-ml-py3          : 7.352.0
omegaconf              : 2.2.3
onnxruntime-gpu        : None
openmim                : 0.3.9
optimum                : 1.18.1
optimum-intel          : None
orjson                 : 3.10.6
pandas                 : 2.2.2
pdf2image              : 1.17.0
Pillow                 : 10.4.0
psutil                 : 5.9.0
pytesseract            : 0.3.10
pytorch-lightning      : 2.3.3
pytorch-metric-learning: 2.3.0
ray                    : 2.10.0
requests               : 2.32.3
scikit-image           : 0.20.0
scikit-learn           : 1.4.0
scikit-learn-intelex   : None
scipy                  : 1.9.1
seqeval                : 1.2.2
setuptools             : 63.4.1
skl2onnx               : None
statsforecast          : 1.4.0
tabpfn                 : None
tensorboard            : 2.17.0
text-unidecode         : 1.3
timm                   : 0.9.16
torch                  : 2.3.1
torchmetrics           : 1.2.1
torchvision            : 0.18.1
tqdm                   : 4.66.4
transformers           : 4.39.3
utilsforecast          : 0.0.10
vowpalwabbit           : None
xgboost                : 2.0.3

Here's the output from the run: Beginning AutoGluon training... Time limit = 800s AutoGluon will save models to 'model' =================== System Info =================== AutoGluon Version: 1.1.1 Python Version: 3.10.12 Operating System: Linux Platform Machine: x86_64 Platform Version: #76~20.04.1-Ubuntu SMP Thu Jun 13 18:00:23 UTC 2024 CPU Count: 8 GPU Count: 0 Memory Avail: 39.17 GB / 57.39 GB (68.2%) Disk Space Avail: 208.05 GB / 250.92 GB (82.9%)

Setting presets to: medium_quality

Fitting with arguments: {'enable_ensemble': True, 'eval_metric': SMAPE, 'freq': 'D', 'hyperparameters': 'light', 'known_covariates_names': [], 'num_val_windows': 1, 'prediction_length': 30, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'random_seed': 123, 'refit_every_n_windows': 1, 'refit_full': False, 'skip_model_selection': False, 'target': 'target', 'time_limit': 800, 'verbosity': 2}

And here's the error message: File , line 1 ----> 1 sv_predictor.fit( 2 train_data, 3 time_limit=800, 4 presets="medium_quality" 5 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/core/utils/decorators.py:31, in unpack.._unpack_inner.._call(*args, kwargs) 28 @functools.wraps(f) 29 def _call(*args, kwargs): 30 gargs, gkwargs = g(other_args, args, kwargs) ---> 31 return f(*gargs, gkwargs)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/timeseries/predictor.py:701, in TimeSeriesPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, hyperparameter_tune_kwargs, excluded_model_types, num_val_windows, val_step_size, refit_every_n_windows, refit_full, enable_ensemble, skip_model_selection, random_seed, verbosity) 698 logger.info("\nFitting with arguments:") 699 logger.info(f"{pprint.pformat({k: v for k, v in fit_args.items() if v is not None})}\n") --> 701 train_data = self._check_and_prepare_data_frame(train_data, name="train_data") 702 logger.info(f"Provided train_data has {self._get_dataset_stats(train_data)}") 704 if val_step_size is None:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bb2e512f-4367-4c0c-875d-9a07e849fbca/lib/python3.10/site-packages/autogluon/timeseries/predictor.py:314, in TimeSeriesPredictor._check_and_prepare_data_frame(self, data, name) 312 logger.info(f"Inferred time series frequency: '{df.freq}'") 313 else: --> 314 if df.freq != self.freq: 315 logger.warning(f"{name} with frequency '{df.freq}' has been resampled to frequency '{self.freq}'.") 316 df = df.convert_frequency(freq=self.freq)

File /databricks/python/lib/python3.10/site-packages/pandas/core/generic.py:5575, in NDFrame.getattr(self, name) 5568 if ( 5569 name not in self._internal_names_set 5570 and name not in self._metadata 5571 and name not in self._accessors 5572 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5573 ): 5574 return self[name] -> 5575 return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'freq'

ANNIKADAHLMANN-8451 commented 3 months ago

UPDATE: I am successfully able to run the notebook with Spark version 3.1.2, but get this error on newer version of Spark in my Databricks cluster. Is there any documentation or support on running Autogluon newer Spark versions?