Add `lambda.min.ratio` argument again

Larsvanderlaan commented 3 years ago

When making the changes to the standardize argument, I accidentally removed some features including the lambda.min.ratio argument and the cv_select = FALSE prediction matrix fix. As the cv_select = FALSE bug is already fixed, I won't make any changes regarding this.

nhejazi commented 3 years ago

Thanks for the work on these changes. A few comments below. Let's try to break up this PR into smaller, incremental changes that we can better document, test, and review. For now, the CI builds are failing due to incomplete documentation or errors in required documentation fields. Specific comments:

I don't think we should pass in lambda.min.ratio as a named argument through fit_hal to glmnet. This bypasses a check for the lambda.min.ratio argument (https://github.com/tlverse/hal9001/blob/devel/R/hal.R#L205-L207) that came to be over the course of testing of simulation experiments. This exists because glmnet silently ignores that argument when family != "gaussian", as mentioned here https://www.rdocumentation.org/packages/glmnet/versions/4.1-1/topics/glmnet.
In general, I think we should aim to pass in as many glmnet-specific arguments via ... as possible (instead of as named arguments), overriding them where necessary (with messages to the user) as with, e.g., standardize = FALSE.
I'm not sure if your comment above means that you were planning to address the issue of matrix predictions when cv_select = FALSE, so I'm just noting that this has been resolved by #85.

jeremyrcoyle commented 3 years ago

lambda.min.ratio works for both gaussian and binomial families. The documentation just says that if small lambdas yield a saturated model, it wont continue to fit even smaller lambdas, which is fine, and not the same as silently ignoring the argument. I suggest we remove the check about lambda.min.ratio

Here's some code to confirm:

x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit1 = glmnet(x, y)
fit2 = glmnet(x,y,lambda.min.ratio=1e-6)
min(fit1$lambda)
min(fit2$lambda)
x = matrix(rnorm(100 * 20), 100, 20)

y2 = rbinom(100,1,0.5)
fit1 = glmnet(x, y2, family=binomial())
fit2 = glmnet(x,y2,lambda.min.ratio=1e-6)
min(fit1$lambda)
min(fit2$lambda)

We've seen that not setting lambda.min.ratio can lead to no lambda values that fit the data sufficiently well, and so I believe its appropriate to override the default.

What's the advantage of using dot args when we're trying to override the defaults of glmnet (and be explicit to the end user about it)?

rachaelvp commented 3 years ago

The warning that is suppressed is returned by glmnet and goes something like this: "Convergence for n th lambda value not reached after maxit=i iterations; solutions for larger lambdas returned".

This warning is ok, and is related to what Jeremy mentioned above about small lambdas yielding a saturated model and glmnet terminating the fitting for even smaller lambdas.

I only suppressed it because I thought the tests would fail if a warning was produced.

nhejazi commented 3 years ago

Got it, thanks for clarifying where the warnings are coming from. I’d always taken that warning to indicate a serious issue with fitting across the glmnet regularization path (since you don’t get predictions for each lambda), but fine to ignore, I guess. Sounds like it’s only showing up just now because we explicitly set lambda.min.ratio to a small value?

tlverse / hal9001

Add `lambda.min.ratio` argument again #86