nt-williams / lmtp

:package: Non-parametric Causal Effects Based on Modified Treatment Policies :crystal_ball:
http://www.beyondtheate.com
GNU Affero General Public License v3.0
55 stars 16 forks source link

Handling of null column in under shift dataframes #137

Closed jakewilliami closed 2 weeks ago

jakewilliami commented 1 month ago

Issue description

I was getting some errors with lmtp_{sdr, tmle, sub} when the input data is a tbl_df/tbl:

Error in `[[<-`(`*tmp*`, trt_var, value = NULL) : 
  Can't assign column with `trt_var`.
✖ Subscript `trt_var` must be a location, not a character `NA`.

This is because tbl_df/tbls are more type-safe that data.frames, so they have additional type handling.

See a possible fix in #136. Other approaches might include ensuring Task returns data.frames to avoid the error further up the stacktrace, or simply assert that the input data must be a data.frame.

reprex

library(lmtp)
library(tibble)
A <- "trt"
Y <- paste0("Y.", 1:6)
C <- paste0("C.", 0:5)
W <- c("W1", "W2")
lmtp_{sdr, tmle, sub}(sim_point_surv |> tibble(), A, Y, W, cens = C, folds = 2,
                      shift = static_binary_on, outcome_type = "survival")

Expected behavior

Expected behaviour is the absence of an error. Error is illustrated below.

rdf <- data.frame()
rdf[[as.character(NA)]] <- NULL  # This is fine; it will do nothing

tdf <- tibble()
tdf[[as.character(NA)]] <- NULL  # This will error as lmtp does

R session info

> utils::sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin23.2.0 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /opt/homebrew/Cellar/r/4.3.3/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8

time zone: Pacific/Auckland
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nnls_1.5     tibble_3.2.1 lmtp_1.3.4   pacman_0.5.1

loaded via a namespace (and not attached):
 [1] gam_1.22-3          future.apply_1.11.2 gtable_0.3.5        dplyr_1.1.4         compiler_4.3.3      visdat_0.6.0        tidyselect_1.2.1    parallel_4.3.3      assertthat_0.2.1    splines_4.3.3       globals_0.16.3     
[12] scales_1.3.0        ggplot2_3.5.1       R6_2.5.1            generics_0.1.3      iterators_1.0.14    backports_1.4.1     checkmate_2.3.1     origami_1.0.7       future_1.33.2       munsell_0.5.1       pillar_1.9.0       
[23] rlang_1.1.3         utf8_1.2.4          cli_3.6.2           progressr_0.14.0    magrittr_2.0.3      foreach_1.5.2       digest_0.6.35       grid_4.3.3          rstudioapi_0.15.0   lifecycle_1.0.4     vctrs_0.6.5        
[34] glue_1.7.0          data.table_1.15.4   SuperLearner_2.0-29 listenv_0.9.1       codetools_0.2-19    abind_1.4-5         parallelly_1.37.1   fansi_1.0.6         colorspace_2.1-0    naniar_1.1.0        purrr_1.0.2        
[45] tools_4.3.3         pkgconfig_2.0.3 
nt-williams commented 2 weeks ago

@jakewilliami thanks for pointing this out. I've implemented a fix in the newest version of the package (v1.4.0).