Closed larry77 closed 8 years ago
By looking at how caret deals with glmnet
it looks like only alpha is needing when training. A regularization path with many lambdas not provided by the user leads to the model tuning. Then I can predict with the values of lambda (called s in the glmnet documentation) and select the optimal one.
I just need to make sure this is handled properly in mlr and how to use it. I suppose that caret does the switch lambda<---->s under the hood, but there you cannot go wrong because you can tune only alpha and lambda. Please advise on this.
In the documentation of glmnet, it says:
Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.
Hence, we fit the learner not for just one single value of lambda but for a whole sequence of lambdas. Only during prediction, we specifiy the value of lambda for which we want to make a prediction, which is called s
in this case. If this is confusing, we could think about changing the name of paramter s
to lambda
. On the other hand, we have the policy to not change argument names of underlying functions.
Hi,
First a detail: II see that s appears to be still constrained in [0,1], but this should not be the case any longer if I understand the pull request 915. Is it correct?
Yes, you are correct. Sorry for that. We need to change that.
EDIT: Sorry, misunderstood. The range for s
was already corrected in #915 .
But we are missing a lower = 0
for lambda
in the parameter set.
So is s simply the same as lambda, but when I predict?
Yes.
s
can be either a value included in the lambda
sequence or, if it is not, prediction at s
is done by interpolation (default) or refitting.
Tuning just s
and alpha
is fine and your code looks good to me.
What happens is
glmnet
calculates a suitable lambda
sequence and fits models for the whole sequence (and the given alpha
).s
.To get a feeling for a meaningful maximum value for s
in the parameter set required for tuning I sometimes just train glmnet
and check the maximum of the calculated lambda
sequence.
In my experience that's in most cases smaller than 1.
Best, Julia
But we are missing a lower = 0 for lambda in the parameter set.
so we add that and make the "note" in glmnet better to explain what was discussed here? and we are good?
Thanks a lot for the clarifications and for putting my mind at rest. Keep up the good work!
so we add that and make the "note" in glmnet better to explain what was discussed here? and we are
can somebody pls quickly do this here so we can close? (who knows glmnet at least a bit better) (we had too many questions about this)
thx to julia for doing PR #1033 closing.
Dear All, I see that glmnet has already been debated several times, see for instance
https://github.com/mlr-org/mlr/issues/824 https://github.com/mlr-org/mlr/pull/915 https://github.com/mlr-org/mlr/issues/106
I have some questions (and I post the sessionInfo() at the end if the post). If I type
lrn=makeLearner("regr.glmnet")
getParamSet(lrn)
Then I see the parameters alpha, lambda and s as tunable. First a detail: II see that s appears to be still constrained in [0,1], but this should not be the case any longer if I understand the pull request 915. Is it correct?
Second, I am having second thoughts about the meaning of the parameters and in particular the difference between lambda and s (defined as the shrinkage parameter in issue 106) . If I look at the formula in the glmnet vignette
https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet_beta.html#intro
there are really 2 parameters to tune, lambda and alpha. What is the role of s?
In the documentation
https://cran.r-project.org/web/packages/glmnet/glmnet.pdf
on page 20, s is defined as
So is s simply the same as lambda, but when I predict? I want to tune a glmnet regression model in mlr, without drowning into too many technicalities (just alpha and lambda). How should I do that? For alpha, I can choose any grid in [0,1] that I want, but for lambda what should I do? So far I have played with s in [0,1], but now I fear it is total nonsense. I do not understand if I can just let glmnet find the optimal lambda at the end of the regularization path. As an example, does it make sense to do this
lrn=makeLearner("regr.glmnet")
ps = makeParamSet( makeDiscreteParam("s", values = seq(0., 1, by=.1)), makeDiscreteParam("alpha", values = seq(0., 1, by=.1)))
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("RepCV", folds =3, reps = 3, predict="test")
res = tuneParams(lrn, task = bh.task, resampling = rdesc, par.set = ps, control = ctrl, show.info=F)
or should I use lambda instead of alpha? Any idea? Many thanks
locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] mlr_2.8 ParamHelpers_1.7 ggplot2_2.1.0 BBmisc_1.9
[5] glmnet_2.0-5 foreach_1.4.3 Matrix_1.2-6
loaded via a namespace (and not attached): [1] Rcpp_0.12.5 magrittr_1.5 splines_3.3.1 munsell_0.4.3
[5] xtable_1.8-2 colorspace_1.2-6 lattice_0.20-33 R6_2.1.2
[9] dplyr_0.5.0 stringr_1.0.0 plyr_1.8.4 tools_3.3.1
[13] parallel_3.3.1 grid_3.3.1 checkmate_1.8.0 gtable_0.2.0
[17] DBI_0.4-1 ggvis_0.4.2 htmltools_0.3.5 iterators_1.0.8 [21] survival_2.39-4 assertthat_0.1 digest_0.6.9 tibble_1.0
[25] shiny_0.13.2 reshape2_1.4.1 codetools_0.2-14 mime_0.4
[29] parallelMap_1.3 stringi_1.1.1 scales_0.4.0 backports_1.0.2 [33] httpuv_1.3.3