Closed lizzyagibson closed 4 years ago
Hello, again.
I figured this out. Warn is set to FALSE in the grpreg() function within the cv.grpreg() function. So it's not converging, and it's not warning the user that it isn't converging. It still provides an optimal lambda, but it isn't interpretable. I suggest setting the default warn to TRUE or implementing some other preferred way to notify users of convergence issues.
Thanks!
Thank you very much for bringing this to my attention. This took me a little while to understand what was happening here, but the issue is that (a) you have a large number of unpenalized features (group = 0
) and (b) those features are highly correlated. For some cross-validation folds, then, you end up with an ill-conditioned feature matrix; in the example seed you sent, the condition number of the unpenalized part of the matrix was 274 (>30 generally indicates a big problem). Although X
does not have to be full rank in penalized regression, if you're not applying a penalization, all the usual rules of classical regression must be followed.
grpreg
has now been updated to (a) trigger an error, not a warning, in such situations, and (b) provide an informative message. As of grpreg 3.2.2, your code will produce the error message:
> cvfit <- cv.grpreg(x, y, g, penalty = "grLasso", seed = 1988)
Error: Algorithm failed to converge for any values of lambda. This indicates a
combination of (a) an ill-conditioned feature matrix X and (b) insufficient
penalization. You must fix one or the other for your model to be identifiable.
Thanks, Patrick! I always appreciate an interpretable error message. 🥇
Hello,
Example attached using R version 3.6.0 and grpreg version 3.2-1 (did not have this problem with R version 3.5.3 and grpreg version 3.1-3). Data is publicly available (https://www.cdc.gov/nchs/nhanes/index.htm).
Resulting $cve of cv.grpreg() is sometimes a vector (correct) and sometimes a scalar (wrong), depending on the user-defined grouping and the user-set seed.
The plotted output gives the expected cross-validation curve when cve is a vector, but breaks when cve is a scalar.
Why is this happening? Many thanks!
grpreg_bug.zip