tvdboom / ATOM

Automated Tool for Optimized Modelling
https://tvdboom.github.io/ATOM/
MIT License
152 stars 14 forks source link

Hyperparameter tuning - IndexError: positional indexers are out-of-bounds #26

Closed RavinduThaveesha closed 2 years ago

RavinduThaveesha commented 2 years ago

Contribution guidelines

Description

When trying to tune hyperparameters using the below code

atom.run(
    models=["ET"],
    n_calls=30,
    bo_params={"dimensions": {"all": "n_estimators"}},
)

Actual behaviour

returns IndexError: positional indexers are out-of-bounds on Kaggle notebook.

Python and package version

tvdboom commented 2 years ago

Hi, I can't reproduce the error. Could you share the complete code, data and the full error traceback?

RavinduThaveesha commented 2 years ago

Data

https://www.kaggle.com/c/ml-olympiad-tensorflow-malaysia-user-group/data

Code

atom = ATOMClassifier(X, y, test_size=0.1, verbose=2, random_state=1)
atom.impute(strat_num="mean", strat_cat="drop", max_nan_cols=0.8)
atom.balance(strategy="smote")
atom.scale()
atom.run(models="ET", n_calls=15, bo_params={"dimensions": "n_estimators"})

Error Traceback

`---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_34/2698383641.py in <module>
----> 1 atom.run(models="ET", n_calls=15, bo_params={"dimensions": "n_estimators"})

~/.local/lib/python3.7/site-packages/atom/utils.py in wrapper(*args, **kwargs)
   1188                 raise exception  # Always raise it
   1189         else:
-> 1190             return f(*args, **kwargs)
   1191 
   1192     return wrapper

~/.local/lib/python3.7/site-packages/atom/utils.py in wrapper(*args, **kwargs)
   1206             logger.info(f"{args[0].__class__.__name__}.{f.__name__}()")
   1207 
-> 1208         return f(*args, **kwargs)
   1209 
   1210     return wrapper

/opt/conda/lib/python3.7/site-packages/typeguard/__init__.py in wrapper(*args, **kwargs)
   1031         memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   1032         check_argument_types(memo)
-> 1033         retval = func(*args, **kwargs)
   1034         try:
   1035             check_return_type(retval, memo)

~/.local/lib/python3.7/site-packages/atom/atom.py in run(self, models, metric, greater_is_better, needs_proba, needs_threshold, n_calls, n_initial_points, est_params, bo_params, n_bootstrap, **kwargs)
   1660             trainer = DirectRegressor(*params, **kwargs)
   1661 
-> 1662         self._run(trainer)
   1663 
   1664     @composed(crash, method_to_log, typechecked)

~/.local/lib/python3.7/site-packages/atom/atom.py in _run(self, trainer)
   1608             trainer._branches = self._branches
   1609             trainer.scaled = self.scaled
-> 1610             trainer.run()
   1611         finally:
   1612             # Catch errors and pass them to atom's attribute

~/.local/lib/python3.7/site-packages/atom/utils.py in wrapper(*args, **kwargs)
   1188                 raise exception  # Always raise it
   1189         else:
-> 1190             return f(*args, **kwargs)
   1191 
   1192     return wrapper

~/.local/lib/python3.7/site-packages/atom/utils.py in wrapper(*args, **kwargs)
   1206             logger.info(f"{args[0].__class__.__name__}.{f.__name__}()")
   1207 
-> 1208         return f(*args, **kwargs)
   1209 
   1210     return wrapper

~/.local/lib/python3.7/site-packages/atom/training.py in run(self, *arrays)
     64         self._prepare_parameters()
     65 
---> 66         self._core_iteration()
     67 
     68 

~/.local/lib/python3.7/site-packages/atom/basetrainer.py in _core_iteration(self)
    513         if not self._models:
    514             if len(self.errors) == 1:
--> 515                 raise self.errors[0]
    516             else:
    517                 raise RuntimeError(

~/.local/lib/python3.7/site-packages/atom/basetrainer.py in _core_iteration(self)
    475                 # If it has predefined or custom dimensions, run the BO
    476                 if (m._dimensions or hasattr(m, "get_dimensions")) and m._n_calls > 0:
--> 477                     m.bayesian_optimization()
    478 
    479                 m.fit()

~/.local/lib/python3.7/site-packages/atom/basemodel.py in bayesian_optimization(self)
    540                     if isinstance(base_estimator, str):
    541                         if base_estimator.lower() == "gp":
--> 542                             gp_minimize(**kwargs)
    543                         elif base_estimator.lower() == "et":
    544                             forest_minimize(base_estimator="ET", **kwargs)

/opt/conda/lib/python3.7/site-packages/skopt/optimizer/gp.py in gp_minimize(func, dimensions, base_estimator, n_calls, n_random_starts, n_initial_points, initial_point_generator, acq_func, acq_optimizer, x0, y0, random_state, verbose, callback, n_points, n_restarts_optimizer, xi, kappa, noise, n_jobs, model_queue_size)
    266         n_restarts_optimizer=n_restarts_optimizer,
    267         x0=x0, y0=y0, random_state=rng, verbose=verbose,
--> 268         callback=callback, n_jobs=n_jobs, model_queue_size=model_queue_size)

/opt/conda/lib/python3.7/site-packages/skopt/optimizer/base.py in base_minimize(func, dimensions, base_estimator, n_calls, n_random_starts, n_initial_points, initial_point_generator, acq_func, acq_optimizer, x0, y0, random_state, verbose, callback, n_points, n_restarts_optimizer, xi, kappa, n_jobs, model_queue_size)
    297     for n in range(n_calls):
    298         next_x = optimizer.ask()
--> 299         next_y = func(next_x)
    300         result = optimizer.tell(next_x, next_y)
    301         result.specs = specs

~/.local/lib/python3.7/site-packages/atom/basemodel.py in <lambda>(x)
    522         bo_kwargs = self.T._bo.copy()  # Don't pop params from trainer
    523         kwargs = dict(
--> 524             func=lambda x: optimize(**self.get_parameters(x)),
    525             dimensions=self._dimensions,
    526             n_calls=self._n_calls,

~/.local/lib/python3.7/site-packages/atom/basemodel.py in optimize(**params)
    352 
    353                     # Fit model just on the one fold
--> 354                     score = fit_model(*next(fold.split(self.X_train, self.y_train)))
    355 
    356                 else:  # Use cross validation to get the score

~/.local/lib/python3.7/site-packages/atom/basemodel.py in fit_model(train_idx, val_idx)
    283                 # Define subsets from original dataset
    284                 branch = self.T._get_og_branches()[0]
--> 285                 X_subtrain = branch.dataset.iloc[train_idx, :-1]
    286                 y_subtrain = branch.dataset.iloc[train_idx, -1]
    287                 X_val = branch.dataset.iloc[val_idx, :-1]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1504     def _getitem_tuple(self, tup: tuple):
   1505 
-> 1506         self._has_valid_tuple(tup)
   1507         with suppress(IndexingError):
   1508             return self._getitem_lowerdim(tup)

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    752         for i, k in enumerate(key):
    753             try:
--> 754                 self._validate_key(k, i)
    755             except ValueError as err:
    756                 raise ValueError(

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1422             # check that the key does not exceed the maximum size of the index
   1423             if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 1424                 raise IndexError("positional indexers are out-of-bounds")
   1425         else:
   1426             raise ValueError(f"Can only index by location with a [{self._valid_types}]")

IndexError: positional indexers are out-of-bounds`
tvdboom commented 2 years ago

This is a bug indeed. Thanks for pointing it out. I pushed a fix to the development branch. It will be available with next release. If you can't wait (this may take up to a month since the latest release was quite recently), you can download the updated package directly from git running pip install git+https://github.com/tvdboom/ATOM.git@development.

RavinduThaveesha commented 2 years ago

It's working fine now on the development branch, Thanks for the quick fix. You saved the day and keep up the good work.