nt-williams / lmtp

:package: Non-parametric Causal Effects Based on Modified Treatment Policies :crystal_ball:
http://www.beyondtheate.com
GNU Affero General Public License v3.0
57 stars 17 forks source link

Estimator fails when no variation in an outcome at a time point. #92

Closed kathoffman closed 2 years ago

kathoffman commented 3 years ago

I'm running into issues when I have a rare or non-occurring outcome at a certain time point. I think it'd be helpful if lmtp would automatically recognize when there are no new outcomes to predict, and would not try to run a learner that will fail. I am currently using lmtp_0.9.1.5001 and sl3_1.4.2.

This is a reprex where I run into issues. I've changed the last time point Y.6 from the sim_point_surv data to be 0 instead of 1 for all rows where a new outcome has occurred (defined as Y.6==1 and Y.5==0). lmtp_tmle fails at 50% with the error message "subscript out of bounds". It does this for the default library of "Lrnr_glm" and for any other learners I try, such as "Lrnr_mean".

If possible, could this be fixed when 1) an outcome doesn't occur at all, like this example 2) an outcome doesn't occur within CV superlearning folds and 3) an outcome doesn't occur within cross fitting folds?

library(lmtp)
library(tidyverse)
library(sl3)

sim_point_surv_constant <- 
  sim_point_surv %>%
  mutate(Y.6 = case_when(Y.6 == 1 & Y.5 == 0 ~ 0, # modify example data so no one new gets the outcome at last time point
                         TRUE ~ Y.6))

# Code modified from Example 5.1
a <- "trt"
y <- paste0("Y.", 1:6)
cens <- paste0("C.", 0:5)
baseline <- c("W1", "W2")

progressr::with_progress({
  psi5.1 <- lmtp_tmle(sim_point_surv_constant, a, y, baseline, cens = cens,
                      shift = static_binary_on, folds = 2,
                      outcome_type = "survival")
})

progressr::with_progress({
  psi5.1 <- lmtp_tmle(sim_point_surv_constant, a, y, baseline, cens = cens,
                      shift = static_binary_on, folds = 2,
                      outcome_type = "survival",
                      learners_outcome = sl3::make_learner("Lrnr_mean"),
                      learners_trt = sl3::make_learner("Lrnr_mean"))
})
nt-williams commented 3 years ago

This is only an issue in the sl3 branches and is a bug from the port of SuperLearner to sl3. A variation check exists for the outcome regressions where if there is no variation in the outcome then only an intercept-only model is passed to the Super Learner. This check was not migrated properly from SuperLearner to sl3:

check_variation <- function(outcome, learners) {
  if (sd(outcome) < .Machine$double.eps) {
    return("SL.mean")
  }
  return(learners)
}
nt-williams commented 3 years ago

Fixed in sl3-devel

kathoffman commented 3 years ago

THANK U

kathoffman commented 3 years ago

Ok so some learners (EX: lrnr_glmnet) still seem to fail if there's a rare outcome. You can see the error if you add a line setting one of the Y.6's back to 1 and test it with lrnr_glmnet.

library(lmtp)
library(tidyverse)
library(sl3)

sim_point_surv_constant <- 
  sim_point_surv %>%
  mutate(Y.6 = case_when(Y.6 == 1 & Y.5 == 0 ~ 0, # modify example data so no one new gets the outcome at last time point
                         TRUE ~ Y.6))
sim_point_surv_constant[309,"Y.6"] <- 1

# Code modified from Example 5.1
a <- "trt"
y <- paste0("Y.", 1:6)
cens <- paste0("C.", 0:5)
baseline <- c("W1", "W2")

progressr::with_progress({
  psi5.1 <- lmtp_tmle(sim_point_surv_constant, a, y, baseline, cens = cens,
                      shift = static_binary_on, folds = 2,
                      outcome_type = "survival",
                      learners_outcome = sl3::make_learner("Lrnr_glmnet"),
                      learners_trt = sl3::make_learner("Lrnr_glmnet"))
})

The error message i'm getting is:

" |====================================== | 75%Error in private$.train(subsetted_task, trained_sublearners) : All learners in stack have failed Error in self$compute_step() :
Error in private$.train(subsetted_task, trained_sublearners) : All learners in stack have failed Failed on Stack
Warning message:
In private$.train(subsetted_task, trained_sublearners) : Lrnr_glmnet_NULL_deviance_10_1_100_TRUE failed with message: Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one multinomial or binomial class has 1 or 0 observations; not allowed . It will be removed from the stack"

nt-williams commented 3 years ago

@hoffmakl This is an issue with glmnet's internal checks. The way to prevent lmtp from failing is to include additional base learners that won't fail in this situation. For example, if you include Lrnr_glm in the learner stacks, the procedure will succeed with warnings that Lrnr_glmnet failed in some instances and was given weight zero.