tidymodels / workflows

Modeling Workflows
https://workflows.tidymodels.org/
Other
207 stars 23 forks source link

different errors with augmenting survival models #209

Closed topepo closed 9 months ago

topepo commented 12 months ago

Looking at unit tests for using augment() on workflows for survival models and testing that it properly fails when eval_time is unspecified (see #200). We should get this error:

! The eval_time argument is missing, with no default.

When using the glmnet model, a glmnet-related error is triggered instead of the one for an improper argument call. This doesn't happen when just parsnip is used, so it is most likely a workflows issue.

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting 
#> a method for function 'as.matrix': Cholmod error 'X and/or Y have wrong 
#> dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 88
# Same error with parnsip

proportional_hazards() %>%
  fit(event_time ~ ., data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─parsnip:::augment.model_fit(., new_data = sim_dat)
#>  4.   └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  5.     └─rlang::abort(...)

proportional_hazards(penalty = 0.001) %>% 
  set_engine("glmnet") %>% 
  fit(event_time ~ ., data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─parsnip:::augment.model_fit(., new_data = sim_dat)
#>  4.   └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  5.     └─rlang::abort(...)

Created on 2023-11-09 with reprex v2.0.2

topepo commented 12 months ago

The error occurs in parsnip:::augment_censored(). That calls

predict(x, new_data = new_data, type = "time") # 'x' is a parsnip model

and that fails when new_data is created by the workflow. For example, using the test data, the workflow's new_data looks like:

Browse[2]> new_data
# A tibble: 500 × 4
   `(Intercept)`    X1     X2 event_time
           <dbl> <dbl>  <dbl>     <Surv>
 1             1     1 -0.626  3.072882 
 2             1     1  0.184  1.010515 
 3             1     0 -0.836  5.739025 
 4             1     1  1.60   2.483024 
 5             1     0  0.330 10.896000 
 6             1     0 -0.820  9.783280+
 7             1     1  0.487  3.489154 
 8             1     1  0.738  6.022507+
 9             1     1  0.576  3.636429 
10             1     1 -0.305  6.119918 
# ℹ 490 more rows

For without a workflow, it is:

Browse[2]> new_data
# A tibble: 500 × 3
      X1     X2 event_time
   <dbl>  <dbl>     <Surv>
 1     1 -0.626  3.072882 
 2     1  0.184  1.010515 
 3     0 -0.836  5.739025 
 4     1  1.60   2.483024 
 5     0  0.330 10.896000 
 6     0 -0.820  9.783280+
 7     1  0.487  3.489154 
 8     1  0.738  6.022507+
 9     1  0.576  3.636429 
10     1 -0.305  6.119918 
# ℹ 490 more rows

Perhaps the addition of the intercept column causes the failure (although the error message does not suggest that).

hfrick commented 12 months ago

@topepo which versions did you use here? I can't reproduce with the current dev versions of parsnip, censored, and workflows.

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.
#> Backtrace:
#>     ▆
#>  1. ├─... %>% augment(new_data = sim_dat)
#>  2. ├─generics::augment(., new_data = sim_dat)
#>  3. └─workflows:::augment.workflow(., new_data = sim_dat)
#>  4.   ├─generics::augment(...)
#>  5.   └─parsnip:::augment.model_fit(fit, new_data_forged, eval_time = eval_time, ...)
#>  6.     └─parsnip:::augment_censored(x, new_data, eval_time = eval_time)
#>  7.       └─rlang::abort(...)

Created on 2023-11-13 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Sonoma 14.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/London #> date 2023-11-13 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0) #> broom * 1.0.5 2023-06-09 [1] CRAN (R 4.3.0) #> censored * 0.2.0.9000 2023-07-18 [1] Github (tidymodels/censored@f9eccb6) #> class 7.3-22 2023-05-03 [2] CRAN (R 4.3.1) #> cli 3.6.1.9000 2023-09-26 [1] Github (r-lib/cli@641fe8c) #> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) #> dials * 1.2.0 2023-04-03 [1] CRAN (R 4.3.1) #> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.1) #> future 1.33.0 2023-07-01 [1] CRAN (R 4.3.1) #> future.apply 1.11.0 2023-05-21 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) #> glmnet 4.1-8 2023-08-22 [1] CRAN (R 4.3.0) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.1) #> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> hardhat 1.3.0 2023-03-30 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> infer * 1.0.5 2023-09-06 [1] CRAN (R 4.3.0) #> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.1) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.0) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) #> lava 1.7.3 2023-11-04 [1] CRAN (R 4.3.1) #> lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.1) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1) #> modeldata * 1.2.0 2023-08-09 [1] CRAN (R 4.3.0) #> modelenv 0.1.1 2023-03-08 [1] CRAN (R 4.3.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.1) #> parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.1) #> parsnip * 1.1.1.9001 2023-11-10 [1] Github (tidymodels/parsnip@86f8a4e) #> pillar 1.9.0.9003 2023-11-10 [1] Github (r-lib/pillar@92fdbba) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.0) #> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.3.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.3.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) #> recipes * 1.0.8.9000 2023-11-10 [1] Github (tidymodels/recipes@746b473) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rpart 4.1.21 2023-10-09 [1] CRAN (R 4.3.1) #> rsample * 1.2.0.9000 2023-11-01 [1] Github (tidymodels/rsample@be593b9) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> scales * 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> shape 1.4.6 2021-05-19 [1] CRAN (R 4.3.1) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.3.0) #> survival * 3.5-7 2023-08-14 [1] CRAN (R 4.3.0) #> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidymodels * 1.1.1 2023-08-24 [1] CRAN (R 4.3.1) #> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.1) #> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.1) #> tune * 1.1.2.9000 2023-11-01 [1] Github (tidymodels/tune@3f82cb2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) #> workflows * 1.1.3.9000 2023-11-13 [1] Github (tidymodels/workflows@1413997) #> workflowsets * 1.0.1 2023-04-06 [1] CRAN (R 4.3.1) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> yardstick * 1.2.0.9001 2023-11-01 [1] Github (tidymodels/yardstick@690e738) #> #> [1] /Users/hannah/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
hfrick commented 12 months ago

Surfaced in https://github.com/tidymodels/extratests/pull/120. The corresponding test in extratests should be checked/updated as part of closing this.

hfrick commented 9 months ago

Still can't reproduce the difference between glmnet and survival as the engines so closing this now

library(tidymodels)
library(censored)
#> Loading required package: survival

set.seed(1)
sim_dat <- prodlim::SimSurv(500) %>%
  mutate(event_time = Surv(time, event)) %>%
  select(event_time, X1, X2)

workflow() %>%
  add_model(proportional_hazards()) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.

workflow() %>%
  add_model(proportional_hazards(penalty = 0.001) %>% 
              set_engine("glmnet")) %>%
  add_formula(event_time ~ .) %>%
  fit(data = sim_dat) %>% 
  augment(new_data = sim_dat)
#> Error in `augment()`:
#> ! The `eval_time` argument is missing, with no default.

Created on 2024-01-16 with reprex v2.0.2

github-actions[bot] commented 9 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.