mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.65k stars 405 forks source link

glmnet - Parameter Tuning s and lambda #1030

Closed larry77 closed 8 years ago

larry77 commented 8 years ago

Dear All, I see that glmnet has already been debated several times, see for instance

https://github.com/mlr-org/mlr/issues/824 https://github.com/mlr-org/mlr/pull/915 https://github.com/mlr-org/mlr/issues/106

I have some questions (and I post the sessionInfo() at the end if the post). If I type lrn=makeLearner("regr.glmnet") getParamSet(lrn)

Then I see the parameters alpha, lambda and s as tunable. First a detail: II see that s appears to be still constrained in [0,1], but this should not be the case any longer if I understand the pull request 915. Is it correct?

Second, I am having second thoughts about the meaning of the parameters and in particular the difference between lambda and s (defined as the shrinkage parameter in issue 106) . If I look at the formula in the glmnet vignette

https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet_beta.html#intro

there are really 2 parameters to tune, lambda and alpha. What is the role of s?

In the documentation

https://cran.r-project.org/web/packages/glmnet/glmnet.pdf

on page 20, s is defined as

Value(s)of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model.

So is s simply the same as lambda, but when I predict? I want to tune a glmnet regression model in mlr, without drowning into too many technicalities (just alpha and lambda). How should I do that? For alpha, I can choose any grid in [0,1] that I want, but for lambda what should I do? So far I have played with s in [0,1], but now I fear it is total nonsense. I do not understand if I can just let glmnet find the optimal lambda at the end of the regularization path. As an example, does it make sense to do this

lrn=makeLearner("regr.glmnet") ps = makeParamSet( makeDiscreteParam("s", values = seq(0., 1, by=.1)), makeDiscreteParam("alpha", values = seq(0., 1, by=.1)))

ctrl = makeTuneControlGrid()

rdesc = makeResampleDesc("RepCV", folds =3, reps = 3, predict="test")

res = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps, control = ctrl, show.info=F)

or should I use lambda instead of alpha? Any idea? Many thanks

sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] mlr_2.8 ParamHelpers_1.7 ggplot2_2.1.0 BBmisc_1.9
[5] glmnet_2.0-5 foreach_1.4.3 Matrix_1.2-6

loaded via a namespace (and not attached): [1] Rcpp_0.12.5 magrittr_1.5 splines_3.3.1 munsell_0.4.3
[5] xtable_1.8-2 colorspace_1.2-6 lattice_0.20-33 R6_2.1.2
[9] dplyr_0.5.0 stringr_1.0.0 plyr_1.8.4 tools_3.3.1
[13] parallel_3.3.1 grid_3.3.1 checkmate_1.8.0 gtable_0.2.0
[17] DBI_0.4-1 ggvis_0.4.2 htmltools_0.3.5 iterators_1.0.8 [21] survival_2.39-4 assertthat_0.1 digest_0.6.9 tibble_1.0
[25] shiny_0.13.2 reshape2_1.4.1 codetools_0.2-14 mime_0.4
[29] parallelMap_1.3 stringi_1.1.1 scales_0.4.0 backports_1.0.2 [33] httpuv_1.3.3

larry77 commented 8 years ago

By looking at how caret deals with glmnet

https://stats.stackexchange.com/questions/69638/does-caret-train-function-for-glmnet-cross-validate-for-both-alpha-and-lambda

https://stats.stackexchange.com/questions/69638/does-caret-train-function-for-glmnet-cross-validate-for-both-alpha-and-lambda

it looks like only alpha is needing when training. A regularization path with many lambdas not provided by the user leads to the model tuning. Then I can predict with the values of lambda (called s in the glmnet documentation) and select the optimal one.

I just need to make sure this is handled properly in mlr and how to use it. I suppose that caret does the switch lambda<---->s under the hood, but there you cannot go wrong because you can tune only alpha and lambda. Please advise on this.

studerus commented 8 years ago

In the documentation of glmnet, it says:

Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

Hence, we fit the learner not for just one single value of lambda but for a whole sequence of lambdas. Only during prediction, we specifiy the value of lambda for which we want to make a prediction, which is called s in this case. If this is confusing, we could think about changing the name of paramter s to lambda. On the other hand, we have the policy to not change argument names of underlying functions.

schiffner commented 8 years ago

Hi,

First a detail: II see that s appears to be still constrained in [0,1], but this should not be the case any longer if I understand the pull request 915. Is it correct?

Yes, you are correct. Sorry for that. We need to change that.

EDIT: Sorry, misunderstood. The range for s was already corrected in #915 . But we are missing a lower = 0 for lambda in the parameter set.

So is s simply the same as lambda, but when I predict?

Yes. s can be either a value included in the lambda sequence or, if it is not, prediction at s is done by interpolation (default) or refitting.

Tuning just s and alpha is fine and your code looks good to me. What happens is

To get a feeling for a meaningful maximum value for s in the parameter set required for tuning I sometimes just train glmnet and check the maximum of the calculated lambda sequence. In my experience that's in most cases smaller than 1.

Best, Julia

berndbischl commented 8 years ago

But we are missing a lower = 0 for lambda in the parameter set.

so we add that and make the "note" in glmnet better to explain what was discussed here? and we are good?

larry77 commented 8 years ago

Thanks a lot for the clarifications and for putting my mind at rest. Keep up the good work!

berndbischl commented 8 years ago

so we add that and make the "note" in glmnet better to explain what was discussed here? and we are

can somebody pls quickly do this here so we can close? (who knows glmnet at least a bit better) (we had too many questions about this)

berndbischl commented 8 years ago

thx to julia for doing PR #1033 closing.