tidymodels / censored

Parsnip wrappers for survival models
https://censored.tidymodels.org/
Other
123 stars 12 forks source link

Refactor calculation of survival probabilities for mboost #215

Closed hfrick closed 2 years ago

hfrick commented 2 years ago

Part of #146

This refactors the calculation of the survival probabilities for mboost models based on their survFit objects.

This changes behavior for the edge cases of predicting before or after event times. Do you think this is the right behavior? At time Inf it currently returns the survival probability at the last event but I'm inclined to think it should always be 0. Similarly, should we make return a probability of 1 for -Inf? And if so, do we make sure this also happens for other engines?

# what we used previously:
floor_surv_mboost <- function(x, time) {
  ind <- purrr::map_int(time, ~ max(which(.x > c(-Inf, unname(x$time)))))
  t(unname(rbind(1, x$surv))[ind, ])
}

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(mboost)
#> Loading required package: parallel
#> Loading required package: stabs
lung2 <- lung[-14, ]
lung_pred <- lung2[1:3,]

lung2$time[1] <- -3
mboost_object <- blackboost(Surv(time, status) ~ age + ph.ecog,
                           data = lung2, family = CoxPH())
survFit_object <- survFit(mboost_object, newdata = lung_pred)

# prediction time before 1st event time ---------------------------------

# this falls back to survival prob = 1 if the first event happened after time 0

# if the first event happens before 0, this used to also fall back to prob 1
floor_surv_mboost(survFit_object, -5)
#>      [,1] [,2] [,3]
#> [1,]    1    1    1

# in the new version, this now falls back to the survival prob at that first event
# not "anchoring" at time 0 (since the first event is before that)
survival_prob_mboost(mboost_object, new_data = lung_pred, time = -5) %>%
  tidyr::unnest(cols = .pred)
#> # A tibble: 3 × 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1    -5          0.996
#> 2    -5          0.995
#> 3    -5          0.998

# this is also what happens in the survival package
survfit_object <- coxph(Surv(time, status) ~ age + ph.ecog, data = lung2) %>%
  survfit(newdata = lung_pred)
summary(survfit_object, time = -5)$surv
#>              1         2        3
#> [1,] 0.9951563 0.9970903 0.997481

# infinite prediction times ----------------------------------------------

# this used to error
floor_surv_mboost(survFit_object, c(-Inf, Inf))
#> Warning in max(which(.x > c(-Inf, unname(x$time)))): no non-missing arguments to
#> max; returning -Inf
#> Error: Can't coerce element 1 from a double to a integer

# now it falls back to the survival prob at the first and last event respectively
survival_prob_mboost(mboost_object, new_data = lung_pred, time = c(-Inf, Inf)) %>%
  tidyr::unnest(cols = .pred)
#> # A tibble: 6 × 2
#>   .time .pred_survival
#>   <dbl>          <dbl>
#> 1  -Inf         0.996 
#> 2   Inf         0.0366
#> 3  -Inf         0.995 
#> 4   Inf         0.0226
#> 5  -Inf         0.998 
#> 6   Inf         0.212

# this is different from the survival package which removes the infinite values
summary(survfit_object, time = c(-Inf, Inf), extend = TRUE)$surv
#> Error in findrow(fit, times, extend): no points selected for one or more curves, consider using the extend argument

# should -Inf fall back to 1 and Inf to 0 instead?

Created on 2022-09-19 by the reprex package (v2.0.1)

hfrick commented 2 years ago

update: I'll make this return a survival probability of 1 for time -Inf and 0 for time Inf

github-actions[bot] commented 2 years ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.