Plotting multiple survival curves using ggsurvplot

kassambara commented 6 years ago

(e-mail from a user)

Hi there, I have a list with 100 survfit objects. Each of these has two survival curves (predicted-low-survival, predicted-high-survival). I would like to show a view of the two curves averaged across the 100, with confidence intervals. Is there a way to do this in one function call and get a plot with error bars like this ? https://ncss-wpengine.netdna-ssl.com/wp-content/uploads/2012/10/SurvivalPlot.png

Alternately I could access the plotted [x,y,group] information from ggsurvplot and create a new ggplot with the average.

Example code: fitList # list of size 100 with 100 survfit objects pooled_info <- c() for (k in 1:100) { cur <- ggsurvplot(fitList[[k]]) pooled_info <- rbind(pooled_info, c(cur$plotted_x, cur$plotted_y, cur$classes)) } p <- ggplot(pooled_info, aes(x=x, y=y, colour=group)) + geom_line() + stat_summary()...

Any help would be great, thanks!

kassambara commented 6 years ago

I think that what you need is this:

library(survival)
library(survminer)
data(lung)

# Compute survival curves

# Average survival curves = null model
fit.null <- survfit(Surv(time, status)~1, data = lung)  

fit1 <- survfit(Surv(time, status) ~ ph.ecog, data = lung)
fit2 <- survfit(Surv(time, status) ~sex, data = lung)

# Combine survival curves
fit.list <- list(
  ph.ecog = fit1, perfor = sex, fit.null = fit.null
  )
ggsurv <- ggsurvplot(fit.list, data = lung, censor = FALSE,
          combine = TRUE, keep.data = TRUE)
ggsurv

rplot22

Or may be you want to do it (semi-) manually:


fit.list <- list(
  ph.ecog = fit1, sex = fit2
  )
ggsurv <- ggsurvplot(fit.list, data = lung, censor = FALSE,
          combine = TRUE, keep.data = TRUE, 
          palette = "Dark2", legend = "right")

# Add null model manually
library(dplyr)
summary.null <- surv_summary(fit.null) %>% 
  as_data_frame()

ggsurv$plot +
  geom_step(
    aes(time, surv), data = summary.null,
    color = "black", size = 2
    )

# Access to the data used to create the combined survival curves
ggsurv$data.survplot

# A tibble: 608 x 6
    time n.censor      surv     upper     lower             strata
                                   
 1     5        0 0.9841270 1.0000000 0.9537433 ph.ecog::ph.ecog=0
 2    11        0 0.9682540 1.0000000 0.9259146 ph.ecog::ph.ecog=0
 3    15        0 0.9523810 1.0000000 0.9012200 ph.ecog::ph.ecog=0
 4    31        0 0.9365079 0.9986992 0.8781894 ph.ecog::ph.ecog=0
 5    53        0 0.9206349 0.9898618 0.8562495 ph.ecog::ph.ecog=0
 6    65        0 0.9047619 0.9802300 0.8351041 ph.ecog::ph.ecog=0
 7    81        0 0.8888889 0.9699805 0.8145766 ph.ecog::ph.ecog=0
 8   147        0 0.8730159 0.9592292 0.7945512 ph.ecog::ph.ecog=0
 9   166        0 0.8571429 0.9480567 0.7749472 ph.ecog::ph.ecog=0
10   175        1 0.8571429 0.9480567 0.7749472 ph.ecog::ph.ecog=0
# ... with 598 more rows

shraddhapai commented 6 years ago

Hi @kassambara thanks for the response! In my case, I want to look at the average and variation over the entries of the list. The setting is that we have run a predictor 100 times, and each time the predictor "calls" patient subtype. We want to see the variation in the average survival curve for each predicted subtype across the 100 iterations.

So the input data for the SurvObj is different for each round. Each entry of fit.list has two survival curves (patients type A and type B) and I have 100 such entries.

In the final figure I am aiming to see two KM curves, one for A and one for B, showing the variation in the survival profile across the 100 iterations. So the confidence interval shading for A would show the variation in the survival curve for A across the 100 iterations. The same for B.

The figure above seems to overlay different KM curves but does not average them, I think?

Thanks, Shraddha

kassambara / survminer

Plotting multiple survival curves using ggsurvplot #261