Closed Larsvanderlaan closed 3 years ago
Thanks for the work on these changes. A few comments below. Let's try to break up this PR into smaller, incremental changes that we can better document, test, and review. For now, the CI builds are failing due to incomplete documentation or errors in required documentation fields. Specific comments:
lambda.min.ratio
as a named argument through fit_hal
to glmnet
. This bypasses a check for the lambda.min.ratio
argument (https://github.com/tlverse/hal9001/blob/devel/R/hal.R#L205-L207) that came to be over the course of testing of simulation experiments. This exists because glmnet
silently ignores that argument when family != "gaussian"
, as mentioned here https://www.rdocumentation.org/packages/glmnet/versions/4.1-1/topics/glmnet.glmnet
-specific arguments via ...
as possible (instead of as named arguments), overriding them where necessary (with messages to the user) as with, e.g., standardize = FALSE
.cv_select = FALSE
, so I'm just noting that this has been resolved by #85.lambda.min.ratio
works for both gaussian and binomial families. The documentation just says that if small lambdas yield a saturated model, it wont continue to fit even smaller lambdas, which is fine, and not the same as silently ignoring the argument. I suggest we remove the check about lambda.min.ratio
Here's some code to confirm:
x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit1 = glmnet(x, y)
fit2 = glmnet(x,y,lambda.min.ratio=1e-6)
min(fit1$lambda)
min(fit2$lambda)
x = matrix(rnorm(100 * 20), 100, 20)
y2 = rbinom(100,1,0.5)
fit1 = glmnet(x, y2, family=binomial())
fit2 = glmnet(x,y2,lambda.min.ratio=1e-6)
min(fit1$lambda)
min(fit2$lambda)
We've seen that not setting lambda.min.ratio
can lead to no lambda values that fit the data sufficiently well, and so I believe its appropriate to override the default.
What's the advantage of using dot args when we're trying to override the defaults of glmnet (and be explicit to the end user about it)?
The warning that is suppressed is returned by glmnet
and goes something like this:
"Convergence for n th lambda value not reached after maxit=i iterations; solutions for larger lambdas returned".
This warning is ok, and is related to what Jeremy mentioned above about small lambdas yielding a saturated model and glmnet
terminating the fitting for even smaller lambdas.
I only suppressed it because I thought the tests would fail if a warning was produced.
Got it, thanks for clarifying where the warnings are coming from. I’d always taken that warning to indicate a serious issue with fitting across the glmnet
regularization path (since you don’t get predictions for each lambda), but fine to ignore, I guess. Sounds like it’s only showing up just now because we explicitly set lambda.min.ratio
to a small value?
When making the changes to the
standardize
argument, I accidentally removed some features including thelambda.min.ratio
argument and thecv_select = FALSE
prediction matrix fix. As thecv_select = FALSE
bug is already fixed, I won't make any changes regarding this.