Closed gengbo-genentech closed 3 years ago
Hi @gengbo-genentech! I have tried to reproduce the issue, however, it works for me. Please make sure you use probatus 1.7.0, since only from that version we support BayesSearchCV. In case that does not work, please try updating skopt. Let us know if that has helped, otherwise, we will investigate further.
Hi @Matgrb Thank you for your response! I am wondering what the version number of skopt and sklearn you are using.
These ones work for me:
scikit-learn 0.23.2
scikit-optimize 0.8.1
shap 0.39.0
Which ones do you use, and does it work with 1.7.0 probatus?
If updating probatus to 1.7.0 does not help then we need to investigate further:
What would be helpful, is running the following code:
clf=xgb.XGBClassifier()
param_grid = {
'max_depth': Integer(1, 11),
'learning_rate': Real(0.0001, 0.5, prior='log-uniform'),
'n_estimators': Integer(50, 5000, prior='uniform'),
'gamma': Real(0.0001, 5, prior='log-uniform'),
'min_child_weight': Real(1, 10, prior='log-uniform'),
'subsample': Real(0.5, 1, prior='uniform'),
'colsample_bytree': Real(0.5, 1, prior='uniform'),
'colsample_bylevel': Real(0.5, 1, prior='uniform'),
'reg_alpha': Real(0.0001, 1, prior='log-uniform'),
'reg_lambda': Real(1, 10, prior='log-uniform'),
}
xgb_search = BayesSearchCV(clf, search_spaces=param_grid, n_iter=32, cv=5, random_state=0, scoring='roc_auc', refit=False)
shap_elimination = ShapRFECV(xgb_search, step=0.2, cv=StratifiedKFold(5), scoring='roc_auc', n_jobs=3)
print(shap_elimination.search_clf)
The search_clf
boolean should be True in this case. It indicates that provided classifier is wrapped with SearchCV that performs optimization first.
If the output is True, this means that probatus detects correctly that it is a BaseSearchCV, and the bug is possibly in the part where we run the optimization:
# Optimize parameters
if self.search_clf:
current_search_clf = clone(self.clf).fit(current_X, self.y)
current_clf = current_search_clf.estimator.set_params(**current_search_clf.best_params_)
else:
current_clf = clone(self.clf)
If the output is False, the issue must be in the following lines:
if isinstance(self.clf, BaseSearchCV):
self.search_clf = True
else:
self.search_clf = False
@gengbo-genentech Did upgrading to shap 1.7.0 fix the issue?
@Matgrb I found that the sklearn 0.24.1 is not supported by skopt.BayesSearchCV currently. So I used
sklearn 0.23.2
probatus 1.7.0
skopt 0.8.1
I found that the following code works fine for me.
if self.search_clf:
current_search_clf = clone(self.clf).fit(current_X, self.y)
current_clf = current_search_clf.estimator.set_params(**current_search_clf.best_params_)
However shap_elimination.fit_compute()
will generate error
1 report = shap_elimination.fit_compute(x, y, check_additivity=False)
/opt/anaconda3/lib/python3.7/site-packages/probatus/feature_elimination/feature_elimination.py in fit_compute(self, X, y, columns_to_keep, column_names, **shap_kwargs)
613 """
614
--> 615 self.fit(X, y, columns_to_keep=columns_to_keep, column_names=column_names, **shap_kwargs)
616 return self.compute()
617
/opt/anaconda3/lib/python3.7/site-packages/probatus/feature_elimination/feature_elimination.py in fit(self, X, y, columns_to_keep, column_names, **shap_kwargs)
501 # Optimize parameters
502 if self.search_clf:
--> 503 current_search_clf = clone(self.clf).fit(current_X, self.y)
504 current_clf = current_search_clf.estimator.set_params(**current_search_clf.best_params_)
505 else:
/opt/anaconda3/lib/python3.7/site-packages/skopt/searchcv.py in fit(self, X, y, groups, callback)
692 optim_result = self._step(
693 X, y, search_space, optimizer,
--> 694 groups=groups, n_points=n_points_adjusted
695 )
696 n_iter -= n_points
/opt/anaconda3/lib/python3.7/site-packages/skopt/searchcv.py in _step(self, X, y, search_space, optimizer, groups, n_points)
563
564 # get parameter values to evaluate
--> 565 params = optimizer.ask(n_points=n_points)
566
567 # convert parameters to python native types
/opt/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py in ask(self, n_points, strategy)
415 opt._tell(x, (y_lie, t_lie))
416 else:
--> 417 opt._tell(x, y_lie)
418
419 self.cache_ = {(n_points, strategy): X} # cache_ the result
/opt/anaconda3/lib/python3.7/site-packages/skopt/optimizer/optimizer.py in _tell(self, x, y, fit)
534 with warnings.catch_warnings():
535 warnings.simplefilter("ignore")
--> 536 est.fit(self.space.transform(self.Xi), self.yi)
537
538 if hasattr(self, "next_xs_") and self.acq_func == "gp_hedge":
/opt/anaconda3/lib/python3.7/site-packages/skopt/learning/gaussian_process/gpr.py in fit(self, X, y)
193 noise_level=self.noise, noise_level_bounds="fixed"
194 )
--> 195 super(GaussianProcessRegressor, self).fit(X, y)
196
197 self.noise_ = None
/opt/anaconda3/lib/python3.7/site-packages/sklearn/gaussian_process/_gpr.py in fit(self, X, y)
232 optima = [(self._constrained_optimization(obj_func,
233 self.kernel_.theta,
--> 234 self.kernel_.bounds))]
235
236 # Additional runs are performed from log-uniform chosen initial
/opt/anaconda3/lib/python3.7/site-packages/sklearn/gaussian_process/_gpr.py in _constrained_optimization(self, obj_func, initial_theta, bounds)
502 obj_func, initial_theta, method="L-BFGS-B", jac=True,
503 bounds=bounds)
--> 504 _check_optimize_result("lbfgs", opt_res)
505 theta_opt, func_min = opt_res.x, opt_res.fun
506 elif callable(self.optimizer):
/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/optimize.py in _check_optimize_result(solver, result, max_iter, extra_warning_msg)
241 " https://scikit-learn.org/stable/modules/"
242 "preprocessing.html"
--> 243 ).format(solver, result.status, result.message.decode("latin1"))
244 if extra_warning_msg is not None:
245 warning_msg += "\n" + extra_warning_msg
AttributeError: 'str' object has no attribute 'decode'
I think this is an issue related to the scikit-optimize or sklearn packages. To confirm that you can try to run search_clf.fit(X,y)
before you put them to probatus fit_compute
. If the error appears there as well, it means that the issue is related to these packages. Could you test if this works or throws an error?
Alternatively, I checked online for similar issues and there are two options I see:
Also, if this issue is related to sklearn or scikit-optimize you can:
Hi @Matgrb
This bug AttributeError: 'str' object has no attribute 'decode'
is solved when I downgrade scipy version to 1.5.3.
Thank you so much for your help!
I am trying to use ShapRFECV by BayesSearchCV like the code describe below.
But I got error like: