yanyachen / rBayesianOptimization

Bayesian Optimization of Hyperparameters
81 stars 21 forks source link

init_grid_dt error when init_points=0 #13

Closed yilisg closed 7 years ago

yilisg commented 7 years ago

Hi Yanya, fantastic package and great work!

Could be related to #1. Reproducible error below (note that I've changed early.stopping.rounds to early_stopping_rounds and test.auc.mean from the original example code as these were deprecated in xgboost 0.6+).

# Example 2: Parameter Tuning
library(xgboost)
data(agaricus.train, package = "xgboost")
dtrain <- xgb.DMatrix(agaricus.train$data,
                      label = agaricus.train$label)
cv_folds <- KFold(agaricus.train$label, nfolds = 5,
                  stratified = TRUE, seed = 0)
xgb_cv_bayes <- function(max.depth, min_child_weight, subsample) {
  cv <- xgb.cv(params = list(booster = "gbtree", eta = 0.01,
                             max_depth = max.depth,
                             min_child_weight = min_child_weight,
                             subsample = subsample, colsample_bytree = 0.3,
                             lambda = 1, alpha = 0,
                             objective = "binary:logistic",
                             eval_metric = "auc"),
               data = dtrain, nround = 100,
               folds = cv_folds, prediction = TRUE, showsd = TRUE,
               early_stopping_rounds = 5, maximize = TRUE, verbose = 0)
  list(Score = max(cv$evaluation_log$test_auc_mean),
       Pred = cv$pred)
}
OPT_Res <- BayesianOptimization(xgb_cv_bayes,
                                bounds = list(max.depth = c(2L, 6L),
                                              min_child_weight = c(1L, 10L),
                                              subsample = c(0.5, 0.8)),
                                init_grid_dt = NULL, init_points = 2, n_iter = 5,
                                acq = "ucb", kappa = 2.576, eps = 0.0,
                                verbose = TRUE)

# working so far, but once I put in init_grid_dt and setting init_points to 0
OPT_Res <- BayesianOptimization(xgb_cv_bayes,
                                bounds = list(max.depth = c(2L, 6L),
                                              min_child_weight = c(1L, 10L),
                                              subsample = c(0.5, 0.8)),
                                init_grid_dt = data.frame(max.depth = 2L,
                                                          min_child_weight = 1L,
                                                          subsample = 0.5),
                                init_points = 0, n_iter = 5,
                                acq = "ucb", kappa = 2.576, eps = 0.0,
                                verbose = TRUE)

Error in GPfit::GP_fit(X = Par_Mat[Rounds_Unique, ], Y = Value_Vec[Rounds_Unique], : The dimensions of X and Y do not match.

# one workaround seems to be set init_points = 1
OPT_Res <- BayesianOptimization(xgb_cv_bayes,
                                bounds = list(max.depth = c(2L, 6L),
                                              min_child_weight = c(1L, 10L),
                                              subsample = c(0.5, 0.8)),
                                init_grid_dt = data.frame(max.depth = 2L,
                                                          min_child_weight = 1L,
                                                          subsample = 0.5),
                                init_points = 1, n_iter = 5,
                                acq = "ucb", kappa = 2.576, eps = 0.0,
                                verbose = TRUE)
yilisg commented 7 years ago

An off-topic question is your choice of "ucb" as default for acquisition function. From the "Practical Bayesian Optimization of Machine Learning Algorithms" paper, the authors seemed to conclude that "ei" outperformed "ucb" in almost every case. Do you agree and would you consider "ei" be the default option for the BayesianOptimization function? On a related note, how would you think about the eps parameter for "ei", is it usually closer to 0 (say 0.001, 0.01, 0.02 type of value) or more "gamma"-like (say 1,2,5,10,25,100 etc.)? I am also curious about the origin of the 2.576 kappa you've defaulted. Thanks for your thoughts.

yanyachen commented 7 years ago

The error doesn't come from setting the init_points = 0. For argument init_grid_dt, usually it would be better to give some manually tuning results or several manually points in hyper-parameter space. The GP can't run only on 1 observation. In principle, you should at least set 2 different sample points in init_grid_dt.

# please set at least 2 different sample points in init_grid_dt
OPT_Res <- BayesianOptimization(FUN = xgb_cv_bayes,
                                bounds = list(max.depth = c(2L, 6L),
                                              min_child_weight = c(1L, 10L),
                                              subsample = c(0.5, 0.8)),
                                init_grid_dt = data.frame(max.depth = c(2L, 3L), 
                                                          min_child_weight = c(1L, 2L),
                                                          subsample = c(0.5, 0.6)),
                                init_points = 0, n_iter = 5,
                                acq = "ucb", kappa = 2.576, eps = 0.0,
                                verbose = TRUE)
yanyachen commented 7 years ago

To my understanding, gamma is like a "Z-Score", eps should be in same scale as the evaluation metric. I set the "ucb" as default because it's the easiest one to understand. But for "ei", it require more tuning work on eps and the best value is different in terms of evaluation metric and your data.

yilisg commented 7 years ago

Thanks for the clarification. Having more than one init points would solve the problem. Not an issue but would suggest some kind of error message to the user.