Feature request: Combine multiple survfits

DanChaltiel commented 1 month ago

Is your feature request related to a problem? Please describe. When I'm only describing the survival of a population (i.e. with no grouping variable), I would like to be able to show both OS and PFS on the same graph.

Describe the solution you'd like The solution is described in https://github.com/kassambara/survminer/issues/195.

Describe alternatives you've considered Alternative solutions would be (1) to use survminer, but the project is unfortunately not maintained anymore and I like the interface of ggsurvfit better, or (2) to plot both graphs separately.

Additional context Here is a screenshot of the above-mentioned issue:

ddsjoberg commented 1 month ago

Hi @DanChaltiel , thanks for the post!

I think this is something that people would need, so we should think on the best way to implement.

Certainly, the easiest way for me to maintain is ask the users to do some pre-processing and create a data frame that is one line per subject per outcome. The data frame would include a column indicating which outcome the row is associated with. We can then pass that data frame to survfit(), where we stratify the figures by outcome and we get what we want. Then there are zero updates need to the core functionality in the package.

Maybe we simply add a helper function to do this pre-processing? Minimally, we can add an example to the gallery showing how to do the pre-processing.

What do you think?

library(ggsurvfit)
#> Loading required package: ggplot2

packageVersion("ggsurvfit")
#> [1] '1.1.0'


# create a stacked data frame with 2 outcomes
stacked_adtte <-
  adtte |> 
  dplyr::select(USUBJID, PARAM, AVAL, CNSR) %>%
  dplyr::bind_rows(
    .,
    dplyr::mutate(
      ., 
      AVAL = AVAL / (1 + runif(dplyr::n())),
      PARAM = "Overall survival (years)"
    )
  )

survfit2(Surv_CNSR(AVAL, CNSR) ~ PARAM, data = stacked_adtte) |> 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

^{Created on 2024-05-17 with reprex v2.1.0}

DanChaltiel commented 1 month ago

Oh you are right, you can simply stack the data! Indeed, a helper function would be enough, thanks for considering the FR :-)

I'm not sure about how the interface should look like though. In a real case, there are several couples time/event in a single dataframe, so Surv(AVAL, CNSR) in your example might not always make sense. Maybe the helper could have some time_to="stacked_time" and event_to="stacked_event" arguments that specify the names the used should put in Surv.

In the end, you would input your couples somehow like this:

survs = list(
  "Overall survival (years)"=c("os.time", "os.status"),
  "Progression-free survival (years)"=c("pfs.time", "pfs.status")
)
stacked_adtte = stack_surv(adtte, survs=survs)

ddsjoberg commented 1 month ago

Couple of things to consider:

The Surv() function has arguments c(time, time2, event). We probably want to handle all of these.
This will be complicated with competing risks models, as the event is a factor. For example, a cancer recurrence outcome may have factor levels, c("censored", "recurrence", "death from other causes"), and a death from cancer outcome would have levels c("censored", "death from cancer", "death from other causes"). We'd some way to reconcile these varying levels.

Anyway, I'll leave this issue open for now. But to be honest, no plans to implement in the immediate future (just to set expectations!)

DanChaltiel commented 1 month ago

No problem, this is perfectly understandable! In the meantime, here is the function that I'll be using to handle the issue. For point (2) I'm not very familiar with competing risk visualization, but if it is only a problem of binding multiple factors, maybe adding stacked_event = as.character(stacked_event) in the mutate calls would be enough.

library(tidyverse)
library(ggsurvfit)

stack_surv = function(data, surv_list){
  surv_list %>% 
    imap(~{
      if(length(.x)==2){
        df = data %>% 
          select(stacked_time = all_of(.x[1]), 
                 stacked_event = all_of(.x[2])) %>% 
          mutate(stacked_survtype=.y)
      } else if (length(.x) == 3){
        df = data %>% 
          select(stacked_time = all_of(.x[1]), 
                 stacked_time2 = all_of(.x[2]), 
                 stacked_event =all_of(.x[3])) %>% 
          mutate(stacked_survtype=.y)
      } else {
        stop("Length 2 or 3 only")
      }
    }) %>% 
    list_rbind()
}

set.seed(42)
df_surv = survival::lung %>% 
  as_tibble() %>% 
  mutate(
    time_os = time,
    event_os = status==2,
    time_pfs = time / (1 + runif(dplyr::n())),
    event_pfs = ifelse(runif(dplyr::n())>0.3, event_os, 1),
    time_efs = time_pfs / (1 + runif(dplyr::n())),
    event_efs = ifelse(runif(dplyr::n())>0.3, event_pfs, 1),
  )

#Right censoring
surv_list = list(
  "Overall survival"=c("time_os", "event_os"),
  "Progression-free survival"=c("time_pfs", "event_pfs"),
  "Event-free survival"=c("time_efs", "event_efs")
)
stack_surv(df_surv, surv_list) %>% 
  survfit2(formula=Surv(stacked_time, stacked_event) ~ stacked_survtype, 
           data = .) %>% 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

#Interval censoring (dummy)
surv_list2 = list(
  "Overall survival"=c("time_os", "time_pfs", "event_os"),
  "Progression-free survival"=c("time_os", "time_pfs", "event_pfs")
)
stack_surv(df_surv, surv_list2) %>% 
  survfit2(formula=Surv(stacked_time, stacked_event) ~ stacked_survtype, 
           data = .) %>% 
  ggsurvfit() +
  add_confidence_interval() +
  add_risktable() +
  scale_ggsurvfit()

^{Created on 2024-05-18 with reprex v2.1.0}

pharmaverse / ggsurvfit

Feature request: Combine multiple survfits #212