mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

Hyperparameter tuning of mstop in mboost using mlr #2691

Closed bernard-liew closed 4 years ago

bernard-liew commented 4 years ago

Dear mlr members,

Many thanks for introducing this package, which I am just getting myself familiar with. This question is on hyperparameter tuning of the mstop criteria in mboost (#1042).

The CV procedure using the example in the mboost package results in very quick determination of the optimal mstop. When using mlr, and following the principles of tuning found on your website (https://mlr.mlr-org.com/articles/tutorial/tune.html), the tuning process is very slow (which may be due to me not doing things correctly).

Question: 1) Is there a more efficient method of tuning for mboost models in mlr. 2) Can (1) be done in a nested resampling setup, like you would do for any tuning on your website (https://mlr.mlr-org.com/articles/tutorial/nested_resampling.html)?

library(mlr)
#> Loading required package: ParamHelpers
library(mboost)
#> Loading required package: parallel
#> Loading required package: stabs
#> 
#> Attaching package: 'stabs'
#> The following object is masked from 'package:mlr':
#> 
#>     subsample
#> This is mboost 2.9-1. See 'package?mboost' and 'news(package  = "mboost")'
#> for a complete list of changes.
library(bench)

data("bodyfat", package = "TH.data")

# Method 1 from mboost#
mod.boost <- glmboost(DEXfat ~ .,
  data = bodyfat,
  control = boost_control(mstop = 1000),
  center = TRUE)

bench::system_time(
  cvm <- cvrisk(mod.boost)
)
#> process    real 
#> 73.66ms   2.28s
mod.boost[mstop(cvm)]
#> 
#>   Generalized Linear Models Fitted via Gradient Boosting
#> 
#> Call:
#> glmboost.formula(formula = DEXfat ~ ., data = bodyfat, center = TRUE,     control = boost_control(mstop = 1000))
#> 
#> 
#>   Squared Error (Regression) 
#> 
#> Loss function: (y - f)^2 
#>  
#> 
#> Number of boosting iterations: mstop = 58 
#> Step size:  0.1 
#> Offset:  30.78282 
#> 
#> Coefficients: 
#>  (Intercept)          age    waistcirc      hipcirc elbowbreadth  kneebreadth 
#> -99.27777374   0.01012336   0.18930461   0.35078804  -0.03232033   1.60539381 
#>     anthro3a     anthro3b     anthro3c 
#>   3.32686027   3.60515479   0.57299627 
#> attr(,"offset")
#> [1] 30.78282

# Method 2 using mlr #
practask <- makeRegrTask(id = "prac", data = bodyfat, target = "DEXfat")
lrn.boost <- makeLearner("regr.glmboost",
  family = "Gaussian",
  center = TRUE,
  nu = 0.001)

ps <- makeParamSet(
  makeDiscreteParam("mstop", values = seq(1, 1000, 1))
)
inner = makeResampleDesc("Bootstrap", iters = 2)
ctrl <- makeTuneControlGrid()

bench::system_time(
  lrn.boost <- tuneParams(lrn.boost,
    task = practask,
    resampling = inner,
    par.set = ps,
    control = ctrl,
    show.info = FALSE)
)
#> process    real 
#>   2.65m   2.65m

Created on 2019-11-28 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.1 Patched (2019-08-30 r77101) #> os macOS Mojave 10.14.6 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Zurich #> date 2019-11-28 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.5 2019-10-02 [1] #> BBmisc 1.11 2017-03-10 [1] #> bench * 1.0.4 2019-09-06 [1] #> callr 3.3.2 2019-09-22 [1] #> checkmate 1.9.4 2019-07-04 [1] #> cli 1.9.9.9000 2019-11-24 [1] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> data.table 1.12.6 2019-10-18 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.2.1 2019-09-24 [1] #> digest 0.6.23 2019-11-23 [1] #> dplyr 0.8.3 2019-07-04 [1] #> ellipsis 0.3.0 2019-09-20 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.4.0 2018-10-05 [1] #> fastmatch 1.1-0 2017-01-28 [1] #> Formula 1.2-3 2018-05-03 [1] #> fs 1.3.1 2019-05-06 [1] #> ggplot2 3.2.1 2019-08-10 [1] #> glue 1.3.1 2019-03-12 [1] #> gtable 0.3.0 2019-03-25 [1] #> highr 0.8 2019-03-20 [1] #> hms 0.5.2 2019-10-30 [1] #> htmltools 0.4.0 2019-10-04 [1] #> inum 1.0-1 2019-04-25 [1] #> knitr 1.26 2019-11-12 [1] #> lattice 0.20-38 2018-11-04 [2] #> lazyeval 0.2.2 2019-03-15 [1] #> libcoin 1.0-5 2019-08-27 [1] #> lifecycle 0.1.0 2019-08-01 [1] #> magrittr 1.5 2014-11-22 [1] #> Matrix 1.2-17 2019-03-22 [2] #> mboost * 2.9-1 2018-08-22 [1] #> memoise 1.1.0 2017-04-21 [1] #> mlr * 2.16.0 2019-11-26 [1] #> munsell 0.5.0 2018-06-12 [1] #> mvtnorm 1.0-11 2019-06-19 [1] #> nnls 1.4 2012-03-19 [1] #> parallelMap 1.4 2019-05-17 [1] #> ParamHelpers * 1.12.0.9000 2019-11-10 [1] #> partykit 1.2-5 2019-07-18 [1] #> pillar 1.4.2 2019-06-29 [1] #> pkgbuild 1.0.6 2019-10-09 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> pkgload 1.0.2 2018-10-29 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.4.1 2019-07-18 [1] #> progress 1.2.2 2019-05-16 [1] #> ps 1.3.0 2018-12-21 [1] #> purrr 0.3.3 2019-10-18 [1] #> quadprog 1.5-8 2019-11-20 [1] #> R6 2.4.1 2019-11-12 [1] #> Rcpp 1.0.3 2019-11-08 [1] #> remotes 2.1.0 2019-06-24 [1] #> rlang 0.4.2.9000 2019-11-27 [1] #> rmarkdown 1.18 2019-11-27 [1] #> rpart 4.1-15 2019-04-12 [2] #> rprojroot 1.3-2 2018-01-03 [1] #> scales 1.1.0 2019-11-18 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> stabs * 0.6-3 2017-07-19 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> survival 2.44-1.1 2019-04-01 [2] #> testthat 2.3.0 2019-11-05 [1] #> tibble 2.1.3 2019-06-06 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> usethis 1.5.1 2019-07-04 [1] #> vctrs 0.2.0 2019-07-05 [1] #> withr 2.1.2 2018-03-15 [1] #> xfun 0.11 2019-11-12 [1] #> XML 3.98-1.20 2019-06-06 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> Github (r-lib/cli@0995d12) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> Github (berndbischl/ParamHelpers@059eefb) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> Github (r-lib/rlang@1be25e7) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.1) #> CRAN (R 3.6.0) #> CRAN (R 3.6.1) #> #> [1] /Users/pjs/Library/R/3.6/library #> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library ```

Kind regards, Bernard

larskotthoff commented 4 years ago

The duration of the tuning process is entirely up to you -- you can choose number of evaluations etc to affect this (in your case you've set this to 25). This may not achieve the same result, but the efficiency of the process in terms of how long it will take is completely up to you.

You can certainly tune this in a nested resampling setting.

berndbischl commented 4 years ago

he is asking about something else and has a valid concern. @bernard-liew says that mboost implements a special strategy to efficiently tune the mbstop iterations internally. i will post a better explanation and hints later

pat-s commented 4 years ago

(added a proper reprex)

berndbischl commented 4 years ago

hi @bernard-liew how many / which parameters do you normally in such a case want to tune? only mstop or multiple params in combination?

1) mlr's tuning algorithms mainly shine in the case where you have more than one param, so multivariate optimization. then efficiency is mainly influenced by selecting an appropriate optimizer like Bayesian optimization.

2) 2nd point is that mboost uses a specific trick to evaluate a sequence of models which only works for mstop. thats kinda custom code inside of mboost to achieve exactly this. its not simple to generalize this.

3) we are aware of this "gap" and as we like to use (m)boosting ourselves, are further working on this. but all of the new developments are going into mlr3 for this. can i suggest to you to start looking at that?

4) a very (!) nice approach IMHO to achieve something similar as 2) is using hyperband as a tuner for mboost. mlr3 has an experimental version of that. i yesterday wrote a short script and tried it out and it looked quite nice. are you interested in seeing that?

bernard-liew commented 4 years ago

Dear @berndbischl ,

Many thanks for your help. For now, I am largely interested in mstop, but was thinking at the back of tuning the learning rate, nu

I definitely can do tuning directly within mboost, but like you mentioned in the website, evaluating the model's performance on the training set maybe biased high. It will be useful to see the script you used for nested resampling for tuning using hyperbanding. Much appreciated.

Now I am deviating, but I will not dig into details (or maybe open another question). Question: Is this issue in mboost similar to the cvglmnet hyperparameter tuning. That there is some special tuning process going on that is specific to the package, and it is not easy to generalize that process?

Kind regards, Bernard

pat-s commented 4 years ago

@bernard-liew You can find our work on hyperband tuning at https://github.com/mlr-org/mlr3hyperband.

Question: Is this issue in mboost similar to the cvglmnet hyperparameter tuning. That there is some special tuning process going on that is specific to the package, and it is not easy to generalize that process?

Yes, you can see it like that. Some packages rely on "warmstarts" and use certain heuristics which are highly tailored to that one hyperparam.

This then works good and efficient for this hyperparam and using general tuning concepts like grid or random search will suffer in these cases (even though they are not necessarily doing a bad job, it just takes more time).

Please consider asking "usage"/"interpretation" questions on Stackoverflow (using the #mlr tag) or Cross-validated in the future. Thanks.