topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

consistently avoid dense matrix conversion for glmnet(x = ...) #1315

Closed powerpak closed 1 year ago

powerpak commented 2 years ago

(This is a more complete version of the fix submitted for #1096.)

Currently, some of the functions for the glmnet model check if the training data is a sparseMatrix, and some don't. The result is that initial operations in train() might succeed, and then later in the workflow, a step will fail (usually with "Cholmod error 'problem too large'" for a sparseMatrix with very large dimensions) because some of the training data is inadvertently converted to a (impossibly large) dense Matrix.

For instance, this bug currently occurs whenever prob() in glmnet.R is called (which happens if trainControl(classProbs = TRUE)), or if tuneLength is used instead of tuneGrid for train(), because tuneLength = ... triggers a call to grid() in glmnet.R which does not check for a sparseMatrix before executing Matrix::as.matrix().

topepo commented 1 year ago

@powerpak thanks for making this change