yanyachen / rBayesianOptimization

Bayesian Optimization of Hyperparameters
81 stars 21 forks source link

Best paramerter set leads to overfit #42

Open IanniMuliterno opened 3 years ago

IanniMuliterno commented 3 years ago

I am trying to optmize a xgboost algorithm, the 'ntree' is among the parameter set I want to optmize and in the end the BayesianOptimization function returns the "best set" containing ntree at it's max. So when I apply the best set and compare train and test performance I can see a huge difference ( train KS around 50 and test KS around 37).

I've been searching for methods to avoid overfitting on the optmization process but I haven't had any luck. Here is the inputs of the function, any help will be appreciated.


xgb_cv_bayes <- function(max.depth, 
                         eta, 
                         gamma, 
                         colsample_bytree, 
                         subsample, 
                         min_child_weight, 
                         scale_pos_weight,
                         nrounds
) {
  cv <- xgb.cv(params = list(booster = "gbtree", 
                             eta = eta,
                             max_depth = max.depth,
                             min_child_weight = min_child_weight,
                             subsample = subsample, 
                             colsample_bytree = colsample_bytree,
                             nrounds = nrounds,
                             lambda = 1, alpha = 0,
                             objective = "binary:logistic",
                             eval_metric = IM_KS), # I've adapted a KS func. here, but you can use AUC, the default for cases like these
               data = dmatrix_treino, nround = 100,
               folds = cv_folds, prediction = TRUE, showsd = TRUE,
               early_stopping_rounds = 5, maximize = TRUE, verbose = 0)
  list(Score = cv$evaluation_log[, max(test_ks_mean)],
       Pred = cv$pred)
}

BayesianOptimization(xgb_cv_bayes, 
                                bounds = list(max.depth = c(2L, 6L),
                                              min_child_weight = c(1L, 10L),
                                              subsample = c(0.5, 1),
                                              eta = c(0.2,0.5),
                                              gamma = c(0,1),
                                              colsample_bytree = c(0.5,1),
                                              scale_pos_weight = c(0.2,0.5),
                                              nrounds = c(20L,200L)),
                                init_grid_dt = NULL, init_points = 10, n_iter = 10,
                                acq = "ucb", kappa = 2.576, eps = 0.0,
                                verbose = TRUE)