tlverse / sl3

💪 🤔 Modern Super Learning with Machine Learning Pipelines
https://tlverse.org/sl3/
GNU General Public License v3.0
100 stars 40 forks source link

Another problem interactions learner and pipelines #149

Closed jlstiles closed 6 years ago

jlstiles commented 6 years ago

Related to the last issue with lrnr_define_interactions is this, I think. This is key because we want to do this in practice for certain. If I piece together a superlearner by defining lrnr_cv and and fit a metalearner, I just get warnings but no harm is done. Here it breaks.

library(data.table)
library(sl3)
library(origami)
library(R6)
library(SuperLearner)
library(here)

# Below we will perform a revere CV-TMLE
# generate the data
gendata = function(n, g0, c0, Q0) {
  W1 = runif(n, -3, 3)
  W2 = rnorm(n)
  W3 = runif(n)
  W4 = rnorm(n)
  z = rbinom(n, 1, g0(W1, W2, W3, W4))
  C = rbinom(n, 1, c0(z, W1, W2, W3, W4))
  y = rbinom(n, 1, Q0(z, W1, W2, W3, W4))
  data.frame(z, C, W1, W2, W3, W4, y)
}

g0_linear = function(W1, W2, W3, W4) {
  plogis(0.5 * (-0.8 * W1 + 0.39 * W2 + 0.08 * W3 - 0.12 * 
                  W4 - 0.15))
}

c0 = function(z, W1, W2, W3, W4) {
  plogis(0.5 * (-0.4 * W1 + 0.3 * W2^2 + 0.06 * abs(W3) - 0.30 * 
                  .3*z*W4 -.5*W4 + 1-.2*z))
}

Q0_1 = function(z, W1, W2, W3, W4) {
  plogis(0.14 * (2 * z + 3 * z * W1 + 6 * z * W3 * W4 + W2 * 
                   W1 + W3 * W4 + 10 * z * cos(W4)))
}

n = 1000
data = gendata(n, g0_linear, c0, Q0_1)
covariates = colnames(data[c(1,3:6)])

lrnr_mean = make_learner(Lrnr_mean)
lrnr_glm = make_learner(Lrnr_glm)
screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")

lrnr_interactions = Lrnr_define_interactions$new(interactions = list(zW1 = c("z", "W1")))

# the pipeline with interactions breaks
pipeline_glm <- make_learner(Pipeline, lrnr_interactions, screen_cor, lrnr_glm)

# the pipeline without  interactions is OK.  
pipeline_glm1 <- make_learner(Pipeline, screen_cor, lrnr_glm)

nnls_lrnr <- Lrnr_nnls$new()
lrnr_sl <- Lrnr_sl$new(list(pipeline_glm, lrnr_mean), nnls_lrnr)
lrnr_sl1 <- Lrnr_sl$new(list(pipeline_glm1, lrnr_mean), nnls_lrnr)

# make the task to train Qbar on uncensored data
QAW_task = make_sl3_Task(data = data,
                             covariates = covariates,
                             outcome = "y")

sl_fit <- lrnr_sl$train(QAW_task)

# need to do this silly copying or it will break but with the copy it is fine
QAW_task1 = make_sl3_Task(data = data.table::copy(data),
                         covariates = covariates,
                         outcome = "y")
sl_fit1 <- lrnr_sl1$train(QAW_task1)

Error in set(data, j = col_names, value = new_data) : It appears that at some earlier point, names of this data.table have been reassigned. Please ensure to use setnames() rather than names<- or colnames<-. Otherwise, please report to data.table issue tracker. In addition: There were 26 warnings (use warnings() to see them) Failed on chain Error in self$compute_step() : Error in set(data, j = col_names, value = new_data) : It appears that at some earlier point, names of this data.table have been reassigned. Please ensure to use setnames() rather than names<- or colnames<-. Otherwise, please report to data.table issue tracker.

jeremyrcoyle commented 6 years ago

Fixed in https://github.com/tlverse/sl3/pull/150