mattharrison / effective_xgboost_book

262 stars 25 forks source link

Can someone help me understand what's going on here exactly in this XGBoost model grid search #17

Open kareemamrr opened 8 months ago

kareemamrr commented 8 months ago
from sklearn import model_selection
params = {'reg_lambda': [0],
          'learning_rate': [.1, .3],
          'subsample': [.7, 1],
          'max_depth': [2, 3],
          'random_state': [42],
          'n_jobs': [-1],
          'n_estimators': [200]}

xgb2 = xgb.XGBClassifier(early_stopping_rounds=5)
cv = (model_selection.GridSearchCV(xgb2, params, cv=3, n_jobs=-1)
      .fit(X_train, y_train,
           eval_set=[(X_test, y_test)],
           verbose=50)
    )

Since we're using cv=3 the data gets divided into 3 subsets, model is trained on 2 subsets and validated against the third for a total of 3 rounds within one cv operation. Where does the eval_set come into play here? Does the model get validated against this set after each cv operation (completion of 3 rounds)? or is that the set that is used within the cv operation itself to validate the model for each of the 3 rounds.