tidymodels / censored

Parsnip wrappers for survival models
https://censored.tidymodels.org/
Other
123 stars 12 forks source link

Multiple survival curves for single observation from stratified survival model #41

Closed hfrick closed 2 years ago

hfrick commented 3 years ago

For a stratified Cox model, survival::survfit() should return a single survival curve per observation if the strata information is given in the newdata arg. If no information on the strata is given, it returns a curve for each stratum. This works as intended for newdata data frame which contain multiple observations, however if newdata only contains a single observation, survfit() returns a curve per stratum, even if the information for the stratum is given in newdata.

library(survival)

cfit <- coxph(Surv(time, status) ~ age + sex + wt.loss + strata(inst),
              data = lung)

# no strata info --> survival curves for all strata
new_data_0 <- expand.grid(age = c(50, 60), sex = 1, wt.loss = 5)
csurv_0 <- survfit(cfit, newdata = new_data_0)
dim(csurv_0)
#> strata   data
#>     18      2

# add strata info --> survival curves for only those
new_data_n <- data.frame(age = c(50, 60), sex = 1:2, wt.loss = 5, inst = c(6, 11))
csurv_n <- survfit(cfit, newdata = new_data_n)
dim(csurv_n)
#> strata
#>      2

# single observation with strata info --> should be one curve but gives all
new_data_1 <- data.frame(age = 60, sex = 2, wt.loss = 5, inst = 11)
csurv_1 <- survfit(cfit, newdata = new_data_1)
dim(csurv_1)
#> strata
#>     18

Created on 2021-03-29 by the reprex package (v1.0.0)

This has been reported as an issue for the {survival} repo.

Note to us: once this is fixed in {survival} check back here that this translates nicely to {censored}.

library(parsnip)
library(survival)
library(censored)

cox_model <- proportional_hazards() %>%
  set_engine("survival")
cox_fit_strata <- cox_model %>%
  fit(Surv(time, status) ~ age + sex + wt.loss + strata(inst), data = lung)

# strata info for n observations
new_data_n <- data.frame(age = c(50, 60), sex = 1:2, wt.loss = 5, inst = c(6, 11))
predict(cox_fit_strata, new_data = new_data_n, type = "time") # 2 rows = 2 observations
#> # A tibble: 2 x 1
#>   .pred_time
#>        <dbl>
#> 1       339.
#> 2       500.

# stratum info for single observation
new_data_1 <- data.frame(age = 60, sex = 2, wt.loss = 5, inst = 11)
predict(cox_fit_strata, new_data = new_data_1, type = "time") # gives 18 values (for all strata) when it should only give 1
#> # A tibble: 18 x 1
#>    .pred_time
#>         <dbl>
#>  1       497.
#>  2       364.
#>  3       532.
#>  4       651.
#>  5       555.
#>  6       421.
#>  7       558.
#>  8       449.
#>  9       540.
#> 10       511.
#> 11       612.
#> 12       624.
#> 13       583.
#> 14       364.
#> 15       589.
#> 16       818.
#> 17       792.
#> 18      1022

Created on 2021-03-29 by the reprex package (v1.0.0)

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.