mlr-org / mlr3mbo

Flexible Bayesian Optimization in R
https://mlr3mbo.mlr-org.com
25 stars 1 forks source link

Autotuner with XGB #137

Closed fredho-42 closed 5 months ago

fredho-42 commented 7 months ago

Hi,

I'm trying to replicate this and the codes didn't work at xgboost_at_bo$train(task, row_ids = train_indxs) with the error message: Error in init(env) : For early stopping, watchlist must have at least one element

I tried to remove the early stopping arugment in the learner but that would give me another error at xgboost_at_bo$train(task, row_ids = train_indxs): Error in predict.xgb.Booster(model, newdata = newdata) : Feature names stored inobjectandnewdataare different!

R version 4.3.2 with the latest MLR packages and XGBoost package (GPU enabled binary for Windows)

It looks there're something wrong with the wrapper - but i'm not sure how i can fix that.. Any suggestions would be appreciated. Thanks.

Fred

sumny commented 7 months ago

Hi @fredho-42 and sorry for the late reply. Thanks for the issue, however, this has like nothing to do with mlr3mbo but is either an issue with the "surv.xgboost" Learner (now in mlr3extralearners) or with the AutoTuner in mlr3tuning and early stopping in general. To see this, maybe try replacing tnr("mbo") with tnr("random_search"):

library(mlr3)
library(mlr3extralearners)
library(mlr3pipelines)
library(mlr3tuning)
library(mlr3proba)
library(survival)

# Less logging
lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

set.seed(42)
train_indxs = sample(seq_len(nrow(veteran)), 100)
task = as_task_surv(x = veteran, time = "time", event = "status")
poe = po("encode")
task = poe$train(list(task))[[1]]
task

ncores = 4
learner = lrn("surv.xgboost",
  nthread = ncores, booster = "gbtree", early_stopping_rounds = 10,
  nrounds = to_tune(50, 1000),
  eta = to_tune(p_dbl(1e-04, 1, logscale = TRUE)),
  max_depth = to_tune(2, 10))

# Random Search
xgboost_at_rs = AutoTuner$new(
  learner = learner,
  resampling = rsmp("cv", folds = 5),
  measure = msr("surv.cindex"),
  terminator = trm("evals", n_evals = 30),
  tuner = tnr("random_search")
)

xgboost_at_rs$train(task, row_ids = train_indxs)

Tuning with early stopping usually requires you to set the appropriate callback, see https://mlr-org.com/gallery/optimization/2022-11-04-early-stopping-with-xgboost/ for an example. (also, If you early stop during tuning, you likely do not want to tune the number of boosting iterations but simply set them to a very high value).

For example tuning XGBoost on the iris task with early stopping via an autotuner would look like this:

library(mlr3)
library(mlr3learners)
library(mlr3tuning)

set.seed(42)
task = tsk("iris")

learner = lrn("classif.xgboost",
  booster = "gbtree", early_stopping_rounds = 10,
  nrounds = 1000,
  eta = to_tune(p_dbl(1e-04, 1, logscale = TRUE)),
  max_depth = to_tune(2, 10),
  early_stopping_set = "test")

# Random Search
xgboost_at_rs = AutoTuner$new(
  learner = learner,
  resampling = rsmp("cv", folds = 5),
  measure = msr("classif.acc"),
  terminator = trm("evals", n_evals = 10),
  tuner = tnr("random_search"),
  callbacks = clbk("mlr3tuning.early_stopping")
)

xgboost_at_rs$train(task)

Note the early_stopping_set = "test" and callbacks = clbk("mlr3tuning.early_stopping" lines.

Finally, can you please post the output of sessionInfo()