different errors with augmenting survival models

topepo commented 12 months ago

Looking at unit tests for using augment() on workflows for survival models and testing that it properly fails when eval_time is unspecified (see #200). We should get this error:

! The eval_time argument is missing, with no default.

When using the glmnet model, a glmnet-related error is triggered instead of the one for an improper argument call. This doesn't happen when just parsnip is used, so it is most likely a workflows issue.

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting 
#> a method for function 'as.matrix': Cholmod error 'X and/or Y have wrong 
#> dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 88

# Same error with parnsip

proportional_hazards() %>%
  fit(event_time ~ ., data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─parsnip:::augment.model_fit(., new_data = sim_dat)
#>  4.   └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  5.     └─rlang::abort(...)

proportional_hazards(penalty = 0.001) %>% 
  set_engine("glmnet") %>% 
  fit(event_time ~ ., data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─parsnip:::augment.model_fit(., new_data = sim_dat)
#>  4.   └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  5.     └─rlang::abort(...)

^{Created on 2023-11-09 with reprex v2.0.2}

topepo commented 12 months ago

The error occurs in parsnip:::augment_censored(). That calls

predict(x, new_data = new_data, type = "time") # 'x' is a parsnip model

and that fails when new_data is created by the workflow. For example, using the test data, the workflow's new_data looks like:

Browse[2]> new_data
# A tibble: 500 × 4
   `(Intercept)`    X1     X2 event_time
           <dbl> <dbl>  <dbl>     <Surv>
 1             1     1 -0.626  3.072882 
 2             1     1  0.184  1.010515 
 3             1     0 -0.836  5.739025 
 4             1     1  1.60   2.483024 
 5             1     0  0.330 10.896000 
 6             1     0 -0.820  9.783280+
 7             1     1  0.487  3.489154 
 8             1     1  0.738  6.022507+
 9             1     1  0.576  3.636429 
10             1     1 -0.305  6.119918 
# ℹ 490 more rows

For without a workflow, it is:

Browse[2]> new_data
# A tibble: 500 × 3
      X1     X2 event_time
   <dbl>  <dbl>     <Surv>
 1     1 -0.626  3.072882 
 2     1  0.184  1.010515 
 3     0 -0.836  5.739025 
 4     1  1.60   2.483024 
 5     0  0.330 10.896000 
 6     0 -0.820  9.783280+
 7     1  0.487  3.489154 
 8     1  0.738  6.022507+
 9     1  0.576  3.636429 
10     1 -0.305  6.119918 
# ℹ 490 more rows

Perhaps the addition of the intercept column causes the failure (although the error message does not suggest that).

hfrick commented 12 months ago

@topepo which versions did you use here? I can't reproduce with the current dev versions of parsnip, censored, and workflows.

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

^{Created on 2023-11-13 with reprex v2.0.2}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Sonoma 14.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/London #> date 2023-11-13 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0) #> broom * 1.0.5 2023-06-09 [1] CRAN (R 4.3.0) #> censored * 0.2.0.9000 2023-07-18 [1] Github (tidymodels/censored@f9eccb6) #> class 7.3-22 2023-05-03 [2] CRAN (R 4.3.1) #> cli 3.6.1.9000 2023-09-26 [1] Github (r-lib/cli@641fe8c) #> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) #> dials * 1.2.0 2023-04-03 [1] CRAN (R 4.3.1) #> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.1) #> future 1.33.0 2023-07-01 [1] CRAN (R 4.3.1) #> future.apply 1.11.0 2023-05-21 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) #> glmnet 4.1-8 2023-08-22 [1] CRAN (R 4.3.0) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.1) #> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> hardhat 1.3.0 2023-03-30 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> infer * 1.0.5 2023-09-06 [1] CRAN (R 4.3.0) #> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.1) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) #> lava 1.7.3 2023-11-04 [1] CRAN (R 4.3.1) #> lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.1) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1) #> modeldata * 1.2.0 2023-08-09 [1] CRAN (R 4.3.0) #> modelenv 0.1.1 2023-03-08 [1] CRAN (R 4.3.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.1) #> parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.1) #> parsnip * 1.1.1.9001 2023-11-10 [1] Github (tidymodels/parsnip@86f8a4e) #> pillar 1.9.0.9003 2023-11-10 [1] Github (r-lib/pillar@92fdbba) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.0) #> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.3.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.3.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) #> recipes * 1.0.8.9000 2023-11-10 [1] Github (tidymodels/recipes@746b473) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rpart 4.1.21 2023-10-09 [1] CRAN (R 4.3.1) #> rsample * 1.2.0.9000 2023-11-01 [1] Github (tidymodels/rsample@be593b9) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> scales * 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> shape 1.4.6 2021-05-19 [1] CRAN (R 4.3.1) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.3.0) #> survival * 3.5-7 2023-08-14 [1] CRAN (R 4.3.0) #> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidymodels * 1.1.1 2023-08-24 [1] CRAN (R 4.3.1) #> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.1) #> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.1) #> tune * 1.1.2.9000 2023-11-01 [1] Github (tidymodels/tune@3f82cb2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> workflows * 1.1.3.9000 2023-11-13 [1] Github (tidymodels/workflows@1413997) #> workflowsets * 1.0.1 2023-04-06 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> yardstick * 1.2.0.9001 2023-11-01 [1] Github (tidymodels/yardstick@690e738) #> #> [1] /Users/hannah/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

hfrick commented 12 months ago

Surfaced in https://github.com/tidymodels/extratests/pull/120. The corresponding test in extratests should be checked/updated as part of closing this.

hfrick commented 9 months ago

Still can't reproduce the difference between glmnet and survival as the engines so closing this now

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% 
              set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.

^{Created on 2024-01-16 with reprex v2.0.2}

github-actions[bot] commented 9 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

tidymodels / workflows

different errors with augmenting survival models #209