Closed mariesosa closed 3 years ago
@mariesosa thanks a bunch for opening this issue!
@krfricke this looks like an interesting bug where we pass too many items to the Search Algorithm to convert the search space:
(base) ➜ tune-sklearn git:(master) ✗ BETTER_EXCEPTIONS=1 python _test.py
Traceback (most recent call last):
File "_test.py", line 18, in <module>
tune_search = tune_search.fit(X, y)
│ │ │ └ array([0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, ...
│ │ └ array([[ -4.79205113, -6.04797091, 1.80047773, -10.84377091,
5.00779976],
[ -5.20333143, -5.19684753, 0.9...
│ └ TuneSearchCV(cv=5, estimator=RandomForestClassifier(),
loggers=[<class 'ray.tune.logger.CSVLogger'>,
...
└ TuneSearchCV(cv=5, estimator=RandomForestClassifier(),
loggers=[<class 'ray.tune.logger.CSVLogger'>,
...
File "/Users/rliaw/dev/tune-sklearn/tune_sklearn/tune_basesearch.py", line 663, in fit
result = self._fit(X, y, groups, **fit_params)
│ │ │ │ └ {}
│ │ │ └ None
│ │ └ array([0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, ...
│ └ array([[ -4.79205113, -6.04797091, 1.80047773, -10.84377091,
5.00779976],
[ -5.20333143, -5.19684753, 0.9...
└ TuneSearchCV(cv=5, estimator=RandomForestClassifier(),
loggers=[<class 'ray.tune.logger.CSVLogger'>,
...
File "/Users/rliaw/dev/tune-sklearn/tune_sklearn/tune_basesearch.py", line 564, in _fit
analysis = self._tune_run(config, resources_per_trial)
│ │ └ {'cpu': 1, 'gpu': 0}
│ └ {'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(fffffffffffffffffffffffffffffff...
└ TuneSearchCV(cv=5, estimator=RandomForestClassifier(),
loggers=[<class 'ray.tune.logger.CSVLogger'>,
...
File "/Users/rliaw/dev/tune-sklearn/tune_sklearn/tune_search.py", line 715, in _tune_run
analysis = tune.run(trainable, **run_args)
│ │ └ {'scheduler': None, 'reuse_actors': True, 'verbose': 0, 'stop': <ray.tune.stopper.MaximumIterationStopper object at 0x7faf5083d1...
│ └ <class 'tune_sklearn._trainable._Trainable'>
└ <module 'ray.tune' from '/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/__init__.py'>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/tune.py", line 428, in run
if config and not search_alg.set_search_properties(metric, mode, config):
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 53, in set_search_properties
return self.searcher.set_search_properties(metric, mode, config)
│ │ │ └ {'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(fffffffffffffffffffffffffffffff...
│ │ └ None
│ └ None
└ <ray.tune.suggest.search_generator.SearchGenerator object at 0x7faf30824150>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/suggest/hyperopt.py", line 258, in set_search_properties
self._setup_hyperopt()
└ <ray.tune.suggest.hyperopt.HyperOptSearch object at 0x7faf30824810>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/tune/suggest/hyperopt.py", line 200, in _setup_hyperopt
self.domain = hpo.Domain(lambda spc: spc, self._space)
│ │ └ <ray.tune.suggest.hyperopt.HyperOptSearch object at 0x7faf30824810>
│ └ <module 'hyperopt' from '/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/__init__.py'>
└ <ray.tune.suggest.hyperopt.HyperOptSearch object at 0x7faf30824810>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/base.py", line 822, in __init__
self.expr = pyll.as_apply(expr)
│ │ └ {'early_stopping': False, 'early_stop_type': <EarlyStopping.NO_EARLY_STOP: 7>, 'X_id': ObjectRef(fffffffffffffffffffffffffffffff...
│ └ <module 'hyperopt.pyll' from '/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/__init__.py'>
└ <hyperopt.base.Domain object at 0x7faf30ac5d10>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 220, in as_apply
named_args = [(k, as_apply(v)) for (k, v) in items]
│ └ [('X_id', ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000001000000)), ('cv', StratifiedKFold(n_splits=5, random_state=...
└ <function as_apply at 0x7faf105dd0e0>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 220, in <listcomp>
named_args = [(k, as_apply(v)) for (k, v) in items]
│ │ │ │ └ [RandomForestClassifier()]
│ │ │ └ 'estimator_list'
│ │ └ [RandomForestClassifier()]
│ └ <function as_apply at 0x7faf105dd0e0>
└ 'estimator_list'
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 212, in as_apply
rval = Apply('pos_args', [as_apply(a) for a in obj], {}, None)
│ │ └ [RandomForestClassifier()]
│ └ <function as_apply at 0x7faf105dd0e0>
└ <class 'hyperopt.pyll.base.Apply'>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 212, in <listcomp>
rval = Apply('pos_args', [as_apply(a) for a in obj], {}, None)
│ │ │ └ RandomForestClassifier()
│ │ └ RandomForestClassifier()
│ └ <function as_apply at 0x7faf105dd0e0>
└ <class 'hyperopt.pyll.base.Apply'>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 226, in as_apply
rval = Literal(obj)
│ └ RandomForestClassifier()
└ <class 'hyperopt.pyll.base.Literal'>
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/hyperopt/pyll/base.py", line 543, in __init__
o_len = len(obj)
└ RandomForestClassifier()
File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/sklearn/ensemble/_base.py", line 164, in __len__
return len(self.estimators_)
└ RandomForestClassifier()
AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'
Oh interesting! I'll look into this more closely tomorrow!
So this issue comes up because the self.estimators_
of the RandomForestClassifier()
only gets set when fit()
is called.
However, during search space initialization, HyperOpt calls len(param)
:
class Literal(Apply):
def __init__(self, obj=None):
try:
o_len = len(obj)
except TypeError:
o_len = None
Apply.__init__(self, "literal", [], {}, o_len, pure=True)
self._obj = obj
and this does not raise a TypeError
, but rather an AttributeError
, because len(RandomForestClassifier())
returns the number of estimators (decision trees), which are not initialized, yet:
def __len__(self):
"""Return the number of estimators in the ensemble."""
return len(self.estimators_)
While I actually think that this is actually a bug of Hyperopt (it should probably catch a broad exception instead), we might be able to circumvent this by not passing the estimators as objects but pass them via object store references instead. I'll work on a fix today.
Describe the bug
When a
TuneSearchCV
is performed with an unfittedsklearn.RandomForestClassifier
withsearch_optimization="hyperopt"
it raise the following error:AttributeError: 'RandomForestClassifier' object has no attribute 'estimators_'
.Steps/Code to Reproduce
Expected Results
No error is thrown.
Actual Results
Versions
hyperopt==0.2.5 numpy==1.18.4 ray==1.2.0 scikit-learn==0.24.1 tune_sklearn==1.2.0
A bit of a clue
The configuration
config["estimator_list"] = [self.estimator]
in https://github.com/ray-project/tune-sklearn/blob/master/tune_sklearn/tune_search.py#L627 may be involved. Indeed, it seems to be used during the configuration of hyperopt to compute the len of the estimator.