scikit-optimize / scikit-optimize

Sequential model-based optimization with a `scipy.optimize` interface
https://scikit-optimize.github.io
BSD 3-Clause "New" or "Revised" License
2.74k stars 547 forks source link

gp_minimize() produces ValueError while tuning xgboost classifier #1155

Open scigeek72 opened 1 year ago

scigeek72 commented 1 year ago

I am trying to tune the hyperparameters for a xgboost classifier. The target is binary (0/1) and the training set is all numerical. When I am trying to run gp_minimize(), I am getting the following error: Provided transformers should be a Transformer instance. Got <skopt.space.transformers.Identity object. Below is my code (basically a verbatim copy of the example provided in the skopt documentation except I am trying out a xgboost classifier ) that I am trying to use:

scale_pos_weight = len(df_feats[df_feats.isPS==0])/len(df_feats[df_feats.isPS==1])
xgbc = XGBClassifier(scale_pos_weight=scale_pos_weight,
                     objective='binary:hinge') 

space  = [Integer(1, 20, name='max_depth'),
          Integer(100, 1000,name='n_estimator'),
          Real(10**-5, 10**0, "log-uniform", name='learning_rate'),
          Real(0.5, 1,"uniform", name='subsample'),
          Real(10**-5, 10**1, "uniform", name='gamma'),
          Real(10**-5, 10**0, "uniform", name='alpha')]

# The decorator below enables the objective function
# to receive the parameters as keyword arguments.
@use_named_args(space)
def objective(**params):
    '''
    Scitkit Learn Optimize requires an objective function to minimize.
    We use the average of cross-validation mean absolute errors as 
    the objective function (also called cost function in optimization)
    '''
    xgbc.set_params(**params)

    return np.mean(cross_val_score(xgbc, xTrain_t.values, yTrain.values, cv=5, n_jobs=-1,
                                    scoring="f1")) #"f1"

res_gp = gp_minimize(objective, space, n_calls=20, random_state=256)

Below is the traceback to help with the understanding of the issue I am facing:

Traceback (most recent call last):

  Cell In[92], line 1
    res_gp = gp_minimize(objective, space, n_calls=20, random_state=256)

  File ~/opt/anaconda3/envs/datamonitor/lib/python3.10/site-packages/skopt/optimizer/gp.py:252 in gp_minimize
    space = normalize_dimensions(dimensions)

  File ~/opt/anaconda3/envs/datamonitor/lib/python3.10/site-packages/skopt/utils.py:599 in normalize_dimensions
    dimension.set_transformer("normalize")

  File ~/opt/anaconda3/envs/datamonitor/lib/python3.10/site-packages/skopt/space/space.py:493 in set_transformer
    self.transformer = Pipeline(

  File ~/opt/anaconda3/envs/datamonitor/lib/python3.10/site-packages/skopt/space/transformers.py:292 in __init__
    raise ValueError(

ValueError: Provided transformers should be a Transformer instance. Got <skopt.space.transformers.Identity object at 0x7fe10044d090>

I couldn't figure out what is wrong and there isn't any stackoverflow topic on this particular issue.

jecjecj commented 1 year ago

I am having the same issue!

AdamCoxson commented 8 months ago

I had similar issues: I had depreciated use of np.int and then the same AttributeError as you

In the file skopt/space/tranformers.py go to

line 275. Change np.int to int or int64 in return np.round(X_orig).astype(int) line 290. To remove the check manually, I commented out this entire block:

         for transformer in self.transformers:
             if not isinstance(transformer, Transformer):
                 raise ValueError(
                     "Provided transformers should be a Transformer "
                     "instance. Got %s" % transformer
                 )

Hopefully gp_minimize keeps ticking along with some development. All available high-level Bayesian Optimisers do the exact same thing so I don't really want to switch all my code.