joshuaschwab / ltmle

Longitudinal Targeted Maximum Likelihood Estimation package
http://joshuaschwab.github.io/ltmle/
23 stars 16 forks source link

Lasso with SL.glmnet #15

Open PFMB opened 6 years ago

PFMB commented 6 years ago

Hi,

I tried to estimate the AverageTreatmentEffect with 'ltmle' using solely SL.library="SL.glmnet" for the LASSO for variable selection which results in:

Error in lognet(x,is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has 1 or 0 observations; not allowed Additional warning: In FUN(X[[i]], ...) : Error in algorithm SL.glmnet The Algorithm will be removed from the Super Learner (i.e. given weight0)

Error occured during call to SuperLearner: Q.kplus1 ~ A.1 + L.1 + Y.1 + A.2 + L.2 + Y.2 + A.3 Note that some SuperLeaner libraries crash when called with continuous dependent variables, as in the case of initial Q regressions with continuous Y or subsequent Q regressions even if Y is binary. The error reported is: Error in system.time({ : All algorithms dropped from library

Used R-Code:

set.seed(123) n <- 36 # no. of obs are particularly low t <- 3 # points in time

A <- data.frame(matrix(rbinom( n*t,1,0.6), n, t)) L <- A+data.frame(matrix(rnorm( n*t,0,1), n, t)) Y <- L*data.frame(matrix(rgamma( n*t,2,4), n, t)) df <- data.frame(A[,1],L[,1],Y[,1],A[,2],L[,2],Y[,2],A[,3],L[,3],Y[,3]) # assume A->L->Y colnames(df) <- c("A.1","L.1","Y.1","A.2","L.2","Y.2","A.3","L.3","Y.3") YRANGE <- c(min(Y),max(Y))

SL.lib1 <- c("SL.glmnet") SL.lib2 <- c("SL.stepAIC","SL.knn","SL.gam","SL.glm.interaction")

ltmle_est <- ltmle(df, Lnodes = c(2,5,8), Anodes=c(1,4,7), Cnodes = NULL, Ynodes=c(3,6,9), Yrange= YRANGE, estimate.time = FALSE, gbounds=c(0.05,1), abar = list(treament = rep(1L,t), control = rep(0L,t)), SL.library=SL.lib1)

I assume that the internal transformation of the continuous response variable according to Yrange to [0,1] is the origin of the problem. I guess a change in the family argument is needed? I tried to define my own wrapper by modifying https://github.com/ecpolley/SuperLearner/blob/master/R/SL.glmnet.R without success. What do I need to change to fix this?

Many thanks and regards!

@ck37

joshuaschwab commented 6 years ago

Hi, I don't know glmnet especially well, but I think the problem is that glmnet does not work with family="binomial" and continuous Y. n <- 100 p <- 5 X <- matrix(rnorm(n p), n, p) beta <- rnorm(p) Y.continuous <- plogis(X %% beta + rnorm(n)) Y.binary <- rbinom(n, 1, Y.continuous) m1 <- SL.glmnet(Y.binary, X, X, family=binomial(), obsWeights = rep(1, n)) #ok m2 <- SL.glmnet(Y.continuous, X, X, family=binomial(), obsWeights = rep(1, n)) #error

You could use a different SL.library, but if you only have 36 observations, you may be better off using glm instead of SuperLearner (SL.library = "glm", the default).

Josh