tidymodels / censored

Parsnip wrappers for survival models
https://censored.tidymodels.org/
Other
123 stars 12 forks source link

Using `fit_resamples` with `set_mode("censored regression")` produces `Error in check_metrics(): ! Unknown mode for parsnip model` #195

Closed mikemahoney218 closed 1 year ago

mikemahoney218 commented 2 years ago

Apologies if I'm doing something silly here -- or if this is on the roadmap already, or I'm in the wrong place!

I can't seem to figure out how to fit censored models using fit_resamples(), which I think is the only way to fit to rsample objects (without writing code to loop through the sets yourself, at any rate). Am I making an obvious mistake here any chance?

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(rsample)
library(workflows)
library(tune)

data(cancer)

lung <- lung |> 
  tidyr::drop_na()

lung_resamples <- vfold_cv(lung, strata = "status")

sr_mod <- 
  linear_reg() %>%
  set_mode("regression")

set.seed(1)
workflow() |> 
  add_model(sr_mod) |> 
  add_formula(status ~ .) |> 
  fit_resamples(lung_resamples)
#> # Resampling results
#> # 10-fold cross-validation using stratification 
#> # A tibble: 10 × 4
#>    splits           id     .metrics         .notes          
#>    <list>           <chr>  <list>           <list>          
#>  1 <split [150/17]> Fold01 <tibble [2 × 4]> <tibble [0 × 3]>
#>  2 <split [150/17]> Fold02 <tibble [2 × 4]> <tibble [0 × 3]>
#>  3 <split [150/17]> Fold03 <tibble [2 × 4]> <tibble [0 × 3]>
#>  4 <split [150/17]> Fold04 <tibble [2 × 4]> <tibble [0 × 3]>
#>  5 <split [150/17]> Fold05 <tibble [2 × 4]> <tibble [0 × 3]>
#>  6 <split [150/17]> Fold06 <tibble [2 × 4]> <tibble [0 × 3]>
#>  7 <split [150/17]> Fold07 <tibble [2 × 4]> <tibble [0 × 3]>
#>  8 <split [151/16]> Fold08 <tibble [2 × 4]> <tibble [0 × 3]>
#>  9 <split [151/16]> Fold09 <tibble [2 × 4]> <tibble [0 × 3]>
#> 10 <split [151/16]> Fold10 <tibble [2 × 4]> <tibble [0 × 3]>

library(censored)
library(rsample)
library(workflows)
library(tune)

data(cancer)

lung <- lung |> 
  tidyr::drop_na()

lung_resamples <- vfold_cv(lung, strata = "status")

sr_mod <- 
  survival_reg(dist = "weibull") %>%
  set_engine("survival") %>% 
  set_mode("censored regression")

set.seed(1)
sr_fit <- workflow() |> 
  add_model(sr_mod) |> 
  add_formula(Surv(time, status) ~ .) |> 
  fit_resamples(lung_resamples)
#> Error in `check_metrics()`:
#> ! Unknown `mode` for parsnip model.

library(censored)
library(rsample)
library(workflows)
library(tune)

data(cancer)

lung <- lung |> 
  tidyr::drop_na()

lung_resamples <- vfold_cv(lung, strata = "status")

rf_mod <- 
  rand_forest(trees = 200) %>%
  set_engine("partykit") %>% 
  set_mode("censored regression")

set.seed(1)
rf_fit <- workflow() |> 
  add_model(rf_mod) |> 
  add_formula(Surv(time, status) ~ .) |> 
  fit_resamples(lung_resamples)
#> Error in `check_metrics()`:
#> ! Unknown `mode` for parsnip model.

Created on 2022-06-15 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.0 (2022-04-22) #> os Ubuntu 20.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2022-06-15 #> pandoc 2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0) #> censored * 0.0.0.9000 2022-06-16 [1] Github (tidymodels/censored@24015a6) #> class 7.3-20 2022-01-13 [4] CRAN (R 4.1.2) #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0) #> codetools 0.2-18 2020-11-04 [4] CRAN (R 4.0.3) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0) #> dials 1.0.0 2022-06-14 [1] CRAN (R 4.2.0) #> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.2.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> furrr 0.3.0 2022-05-04 [1] CRAN (R 4.2.0) #> future 1.26.1 2022-05-27 [1] CRAN (R 4.2.0) #> future.apply 1.9.0 2022-04-25 [1] CRAN (R 4.2.0) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0) #> ggplot2 3.3.6 2022-05-03 [1] CRAN (R 4.2.0) #> globals 0.15.0 2022-05-09 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> gower 1.0.0 2022-02-03 [1] CRAN (R 4.2.0) #> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.2.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0) #> hardhat 1.1.0 2022-06-10 [1] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0) #> ipred 0.9-13 2022-06-02 [1] CRAN (R 4.2.0) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.0) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0) #> lattice 0.20-45 2021-09-22 [4] CRAN (R 4.2.0) #> lava 1.6.10 2021-09-02 [1] CRAN (R 4.2.0) #> lhs 1.1.5 2022-03-22 [1] CRAN (R 4.2.0) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0) #> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.2.0) #> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> MASS 7.3-57 2022-04-22 [4] CRAN (R 4.2.0) #> Matrix 1.4-1 2022-03-23 [4] CRAN (R 4.1.3) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) #> nnet 7.3-17 2022-01-13 [4] CRAN (R 4.1.2) #> parallelly 1.32.0 2022-06-07 [1] CRAN (R 4.2.0) #> parsnip * 0.2.1.9003 2022-06-16 [1] Github (tidymodels/parsnip@ad4a491) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.2.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.2.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.2.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.0) #> recipes 0.2.0 2022-02-18 [1] CRAN (R 4.2.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) #> rlang 1.0.2.9003 2022-06-16 [1] Github (r-lib/rlang@62cd789) #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0) #> rpart 4.1.16 2022-01-24 [4] CRAN (R 4.1.2) #> rsample * 0.1.1 2021-11-08 [1] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) #> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.0) #> survival * 3.3-1 2022-03-03 [1] CRAN (R 4.2.0) #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0) #> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.0) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0) #> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.2.0) #> tune * 0.2.0.9002 2022-06-16 [1] Github (tidymodels/tune@66323cb) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> workflows * 0.2.6.9001 2022-06-16 [1] Github (tidymodels/workflows@4f9c323) #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> yardstick 1.0.0.9000 2022-06-16 [1] Github (tidymodels/yardstick@90ab794) #> #> [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
hfrick commented 2 years ago

Thanks for trying out censored! :raised_hands: Getting survival analysis to work smoothly with tidymodels requires changes in various places across tidymodels, i.e. more than censored.

To get a workflow to fit, you need to use add_variables() instead of add_formula(). My current understanding is that this is something we'll look into in workflows.

Tuning, or here, tune::fit_resamples() doesn't work yet because we don't have a suitable metric in yardstick yet and haven't touched tune for survival analysis yet.

So yes, this is on the roadmap!

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(rsample)
library(workflows)
library(tune)

data(cancer)

lung <- lung |> 
  tidyr::drop_na()

set.seed(1)
lung_resamples <- vfold_cv(lung, strata = "status")

sr_mod <- 
  survival_reg(dist = "weibull") %>%
  set_engine("survival") %>% 
  set_mode("censored regression")

# workflow works with `add_variables()`
sr_wfl <- workflow() |> 
  add_variables(outcomes = c(time, status),
                predictors = everything()) |> 
  add_model(sr_mod, formula = Surv(time, status) ~ .)

plain_fit <- fit(sr_wfl, lung) 

# tuning does not work yet
sr_fit <- sr_wfl |>
  fit_resamples(lung_resamples)
#> Error in `check_metrics()`:
#> ! Unknown `mode` for parsnip model.

Created on 2022-06-16 by the reprex package (v2.0.1)

amcmahon17 commented 1 year ago

I'm a big fan of this package.

I ran into this same issue today and I'm just wondering where this is in the pipeline at this point. Than you,!

hfrick commented 1 year ago

We are actively working on adding metrics for censored regression to yardstick, anything related to tuning is on the list next!

amcmahon17 commented 1 year ago

We are actively working on adding metrics for censored regression to yardstick, anything related to tuning is on the list next!

Excellent! Sorry for my impatience. 🎉

hfrick commented 1 year ago

All good, it's nice to know that it will be used!

hfrick commented 1 year ago

Things are coming together! We have improved how workflows with censored regression models are handled, we have performance metrics for these models, and we can tune them. This is currently all in various development versions.

We also have a write-up on the performance metrics on tidymodels.org: https://www.tidymodels.org/learn/statistics/survival-metrics/

# install dev versions
#pak::pak(c("tidymodels/censored", "tidymodels/parsnip", "tidymodels/tune"))

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(rsample)
library(workflows)
library(tune)

data(cancer)

lung <- lung |> 
  tidyr::drop_na() |>
  dplyr::mutate(surv = Surv(time, status), .keep = "unused")

set.seed(1)
lung_resamples <- vfold_cv(lung)

sr_mod <- 
  survival_reg(dist = "weibull") %>%
  set_engine("survival") %>% 
  set_mode("censored regression")

# workflow now works also with `add_formula()` if it's not stratified
sr_wfl <- workflow() |> 
  add_formula(surv ~ .) |> 
  add_model(sr_mod)

plain_fit <- fit(sr_wfl, lung) 

# fit_resamples() now works!
sr_fit <- sr_wfl |>
  fit_resamples(lung_resamples, eval_time = c(100, 500))

Created on 2023-07-19 by the reprex package (v2.0.1)

amcmahon17 commented 1 year ago

This is really fantastic! Thank you so much.

On Wed, Jul 19, 2023 at 12:50 PM Hannah Frick @.***> wrote:

Things are coming together! We have improved how workflows with censored regression models are handled, we have performance metrics for these models, and we can tune them. This is currently all in various development versions.

We also have a write-up on the performance metrics on tidymodels.org: https://www.tidymodels.org/learn/statistics/survival-metrics/

install dev versions#pak::pak(c("tidymodels/censored", "tidymodels/parsnip", "tidymodels/tune"))

library(censored)#> Loading required package: parsnip#> Loading required package: survival library(rsample) library(workflows) library(tune)

data(cancer) lung <- lung |> tidyr::drop_na() |> dplyr::mutate(surv = Surv(time, status), .keep = "unused")

set.seed(1)lung_resamples <- vfold_cv(lung) sr_mod <- survival_reg(dist = "weibull") %>% set_engine("survival") %>% set_mode("censored regression")

workflow now works also with add_formula() if it's not stratifiedsr_wfl <- workflow() |>

add_formula(surv ~ .) |> add_model(sr_mod) plain_fit <- fit(sr_wfl, lung)

fit_resamples() now works!sr_fit <- sr_wfl |>

fit_resamples(lung_resamples, eval_time = c(100, 500))

Created on 2023-07-19 by the reprex package https://reprex.tidyverse.org (v2.0.1)

— Reply to this email directly, view it on GitHub https://github.com/tidymodels/censored/issues/195#issuecomment-1642427837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFVEQIXYBBWY3KGKHXP5CJLXRAF53ANCNFSM5Y5FH62Q . You are receiving this because you commented.Message ID: @.***>

hfrick commented 1 year ago

Forgot to close this when I posted the update on the dev versions ⚡

github-actions[bot] commented 11 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.