hongooi73 / glmnetUtils

Utilities for glmnet
65 stars 18 forks source link

cvAlpha.glmnet / cva.glmnet compatibility #15

Closed dschneiderch closed 7 years ago

dschneiderch commented 7 years ago

I was using cvAlpha.glmnet with an older version of glmnetUtils (pre-CRAN I think). I've renamed my function to cva.glmnet but sometimes get: Evaluation error: arguments imply differing number of rows: 0, 16613.

I think I traced this back to the new use.model.frame=FALSE argument.

main point: it might be useful to check the dataframe input to cva.glmnet that it is an actual dataframe rather than a resample object from modelr. or another input type that won't work.

MWE:

library(nycflights13)
library(tidyverse)
library(glmnetUtils)
library(modelr)

# my function
## find glmnet model using  the best alpha (and 1se lambda)
gnet <- function(dF){
    nfolds=5
    myformula=as.formula(arr_delay~distance+sched_arr_time+sched_dep_time)
    bestalpha <- 0.216
    cvfit=cv.glmnet(myformula,data=dF,nfolds=nfolds,type.measure='mse',alpha=bestalpha,use.model.frame=FALSE)
    cvfit$alpha=bestalpha
    return(cvfit)
}

The following works fine:

mdl=gnet(flights)

but the next portion does not unless the function above is changed to use.model.frame=TRUE or as.data.frame is used on the resample object.

flights %>% 
    group_by(carrier) %>% 
    sample_frac(.4) %>% 
    do({
        dat2 <- 
            crossv_mc(.,n=10,test=0.1) %>% 
            mutate(mdlobj=map(train,gnet))
    })

I will note that using use.model.frame=TRUE takes much longer, even compared to the old package version. Is there anything else different?

hongooi73 commented 7 years ago

This is because the output from crossv_mc is not a data frame. It's a resample object, which has to be converted back to a data frame before you can do anything with it.

The base model.frame function is nice and does this conversion for you, but you shouldn't get into the habit of relying on this. At some point the computer will make an incorrect assumption and your analysis will blow up, or worse, silently give nonsense results.

As you found out, inserting an as.data.frame() call into your code will get it to run. I'll add this to glmnetUtils, along with a warning.

Other than that, using cv.glmnet and crossv_mc at the same time seems superfluous since they're both doing crossvalidation, but that's a separate issue.