kassambara / survminer

Survival Analysis and Visualization
https://rpkgs.datanovia.com/survminer/
501 stars 162 forks source link

A workaround function for combining Faceting and Risk Tables #370

Open BingxinS opened 5 years ago

BingxinS commented 5 years ago

Similar issue reported at: https://github.com/kassambara/survminer/issues/330

Expected behavior

Use data frame lung as an example:

library(survminer)
require("survival")
fit = survfit(Surv(time, status) ~ sex, data = lung)

Survival plots can be generated with risk table:

ggsurvplot(fit, risk.table = T)

or been faceted by factor ph.ecog:

ggsurvplot(fit, facet.by = "ph.ecog")

However, the risk table can not be generated with faceting in the current Survminer package:

ggsurvplot(fit, facet.by = "ph.ecog", risk.table = T)

Actual behavior

The plot generated from the above code does not include risk tables.

Steps to reproduce the problem

ggsurvplot(fit, facet.by = "ph.ecog", risk.table = T)

The output plot is missing the risk tables Image of Yaktocat

A workaround function

A way to work around is to generate the survival plotseparatelytable separately and combine them on the same plot.

The plot

Use ggsurvplot_facet_fix() to obtain the survival plots, with fixed origins (refer to the previous issue report, related issue: https://github.com/kassambara/survminer/issues/254 and pull request https://github.com/kassambara/survminer/pull/363).

fit <- survfit(Surv(time, status) ~  sex, lung) 
plt_fct <- ggsurvplot_facet_fix(fit, lung, facet.by="ph.ecog")

The risk table

Although neither ggsurvplot() nor ggsurvplot_fix() generates the risk table in a plot, the data of risk table is available as ggsurv$table$data:

fit <- survfit(Surv(time, status) ~  sex + ph.ecog, lung)
ggsurv <- ggsurvplot(fit, lung,
                     risk.table = TRUE, 
                     ggtheme = theme_bw())

Below is the risk table data returned by ggsurvplot(): Image of Yaktocat

We can use ggplot2 to plot the risk table directly:

tbl_fct <- 
  ggplot(ggsurv$table$data, aes(time, sex)) + 
  geom_text(aes(label = n.risk)) +
  facet_wrap(~ph.ecog)

Image of Yaktocat

We can format the above risk table as:

text.size <- 10
tbl_fct <- 
  ggplot(ggsurv$table$data, aes(time, sex)) + 
  geom_text(aes(label = n.risk), size = max (2, (text.size-6)) ) +
  facet_wrap(~ph.ecog, labeller = label_both) + 
  theme_bw() + 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        strip.background = element_rect(fill="white", size = 1),
        axis.text.x = element_text(color="black", size = text.size+2),
        axis.text.y = element_text(color="black", size = text.size),
        axis.title = element_text(color="black", size = text.size+4),
        strip.text = element_text(color="black", size = text.size))+
  scale_y_discrete(limits=c("2","1"), labels=c("2" = "sex=2", "1" = "sex=1")) +
  ggtitle("Risk Tables") +
  xlab("Time") + ylab("") 

Image of Yaktocat

Combine the survival plots and risk tables

We now combine the above plot and table together:

g_plt_fct <- ggplotGrob(plt_fct)
g_tbl_fct <- ggplotGrob(tbl_fct)
min_ncol <- min(ncol(g_plt_fct), ncol(g_tbl_fct))
g_plt_tbl <- 
  gridExtra::gtable_rbind(g_plt_fct[, 1:min_ncol], g_tbl_fct[, 1:min_ncol], size="last")
g_plt_tbl$widths <- 
  grid::unit.pmax(g_plt_fct$widths, g_tbl_fct$widths)
grid::grid.newpage()
grid::grid.draw(g_plt_tbl)

Image of Yaktocat

Wrap up as function

I had warped it as a function and available at: https://github.com/BingxinS/survminer-fix/blob/master/ggsurvplot_facet_risktable.R Examples are available: https://github.com/BingxinS/survminer-fix/blob/master/README_facet_risktable.md

Example I: with 1 facet factor ph.ecog

library(survminer)
require("survival")
source("https://raw.githubusercontent.com/BingxinS/survminer-fix/master/ggsurvplot_facet.risktable.R")
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
data <- lung
fit <- survfit(Surv(time, status) ~ sex + ph.ecog, data)
ggsurvplot_facet_risktable(fit, data, risktable.ylab="Number at Risk")

These lines will generate the same plot just shown above.

Example II: with 2 facet factors ph.ecog and ph.karno

library(survminer)
require("survival")
source("https://raw.githubusercontent.com/BingxinS/survminer-fix/master/ggsurvplot_facet.risktable.R")
#::::::::::::::::::::::::::::::::::::::::::::::::::::::::
data <- lung
fit <- survfit(Surv(time, status) ~  sex + ph.ecog + ph.karno, data)
ggsurvplot_facet_risktable(fit, data)

Image of Yaktocat

Extra discussion

In all the examples above, the starta factor is sex, which has two levels, 1 and 2. Once been faceted, for instant, by facet factor ph.ecog, which has four levels, 0, 1, 2, and 3, we will not be able to generate a combined facet plot with facet risk table with the current survminer package. The main reason of missing risk table when faceting is caused by the colum strata in the data table:

ggsurvplot_facet(fit, data, facet.by = "ph.ecog")$data

Image of Yaktocat After faceting, strata has 7 levels in this example: "sex=1, ph.ecog=0" "sex=1, ph.ecog=1" "sex=1, ph.ecog=2" "sex=1, ph.ecog=3" "sex=2, ph.ecog=0" "sex=2, ph.ecog=1" "sex=2, ph.ecog=2" But in a faceted sub plot for risk table, we wanted to plot the real 2 levels of strata with sex=1 and sex=2. This is the cause that I can track down in the current survminer package. This naming system is not easy to fix, i.e. separating the mixed strata and facets levels. The same reason also prevent the confidence interval being plotted when combined with faceting. I will try to do a similar workaround function for that.

BingxinS commented 5 years ago

A workaround function in #371 can generate both risk tables and confidence intervals when facetted.

djnusbaum commented 1 year ago

Thanks for figuring this out. Is there any way to switch the order that the strata labels are listed on the left side of the "number at risk" table? For example, I want strata #1 to be listed first, and then strata #2 (currently #2 is listed first). Factor level doesn't seem to influence this.

Also, is there an easy way to customize the plots (e.g. change the color palette, add p-value to the plots)? Or would I need to modify the functions directly?

Thanks!