kassambara / survminer

Survival Analysis and Visualization
https://rpkgs.datanovia.com/survminer/
491 stars 160 forks source link

ggsurvplot does not display groups in the risk table for survfit.cox object #231

Open MarcinKosinski opened 7 years ago

MarcinKosinski commented 7 years ago

I am going through your great tutorial about cox proportional hazards mordel. I've found a very useful tool for visualizing the estimated survival curves that are based on the hazard estimates and baseline survival from the cox ph model.

Expected behavior

One receive survival curves (not the KM estimates this time, but the survival estimates that comes from the cox phmodel) with a risk table that is divided to 2 groups

library(survminer)
library(survival)
res.cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data =  lung)
summary(res.cox)
ggsurvplot(survfit(res.cox), color = "#2E9FDF",
           ggtheme = theme_minimal())
sex_df <- with(lung,
               data.frame(sex = c(1, 2), 
                          age = rep(mean(age, na.rm = TRUE), 2),
                          ph.ecog = c(1, 1)
               )
)
sex_df

fit <- survfit(res.cox, newdata = sex_df)
ggsurvplot(fit, conf.int = TRUE, legend.labs=c("Sex"),
           ggtheme = theme_minimal(), risk.table = TRUE)

Actual behavior

The risk.table appear to produce only one group

surv_one_table

Additional note

If one does not specify the risk.table to be FALSE then one can have the legend.labs to be equal to the number of groups:

ggsurvplot(fit, conf.int = TRUE, legend.labs=c("Sex=1", "Sex=2"),
           ggtheme = theme_minimal())

but when one launches the risk table, then the legend.labs can only have length 1

> ggsurvplot(fit, conf.int = TRUE, legend.labs=c("Sex=1", "Sex=2"),
+            ggtheme = theme_minimal(), risk.table = TRUE)
 Show Traceback

 Rerun with Debug
 Error in factor(survsummary$strata, labels = legend.labs) : 
  invalid 'labels'; length 2 should be 1 or 1 

I think this is due to the one group in the risk table. And when someone specifies the legend.labs to have vector equal to 1 you only specify the strata label in the risk set table but in the survival plot it appears that levels are added to the labels. But this isn't the intuitive behaviour. The legend.labs parameter should work this way that it uses the provided character in the scale_colour_discrete and in the risk table label. Right now for risk.table=TRUE one passes the risk table label (which actually shouldn't be of dimension one) with legend.labs and has to add the scale_colour_discrete to customize the survival plot legends

ggsurvplot(fit, conf.int = TRUE, legend.labs=c("risk_table_LABEL"),
           ggtheme = theme_minimal(), risk.table = TRUE) +
  scale_colour_discrete(labels = c('character1', 'othercharacter2'))
scale_colour_discrete

This is rather complex problem and might not be easy to solve.

session_info()

# please paste here the result of
> devtools::session_info()
Session info ------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.3 (2017-03-06)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language (EN)                        
 collate  Polish_Poland.1250          
 tz       Europe/Warsaw               
 date     2017-06-18                  

Packages ----------------------------------------------------------------------
 package    * version   date       source        
 assertthat   0.2.0     2017-04-11 CRAN (R 3.3.3)
 backports    1.0.5     2017-01-18 CRAN (R 3.3.2)
 broom        0.4.2     2017-02-13 CRAN (R 3.3.3)
 cmprsk       2.2-7     2014-06-17 CRAN (R 3.3.3)
 colorspace   1.3-2     2016-12-14 CRAN (R 3.3.3)
 data.table   1.10.4    2017-02-01 CRAN (R 3.3.2)
 DBI          0.6-1     2017-04-01 CRAN (R 3.3.3)
 devtools   * 1.12.0    2016-06-24 CRAN (R 3.3.3)
 digest       0.6.12    2017-01-27 CRAN (R 3.3.3)
 dplyr        0.5.0     2016-06-24 CRAN (R 3.3.3)
 evaluate     0.10      2016-10-11 CRAN (R 3.3.3)
 foreign      0.8-67    2016-09-13 CRAN (R 3.3.3)
 ggplot2    * 2.2.1     2016-12-30 CRAN (R 3.3.3)
 ggpubr     * 0.1.2     2017-03-14 CRAN (R 3.3.3)
 gridExtra    2.2.1     2016-02-29 CRAN (R 3.3.3)
 gtable       0.2.0     2016-02-26 CRAN (R 3.3.3)
 htmltools    0.3.5     2016-03-21 CRAN (R 3.3.3)
 km.ci        0.5-2     2009-08-30 CRAN (R 3.3.3)
 KMsurv       0.1-5     2012-12-03 CRAN (R 3.3.2)
 knitr        1.15.1    2016-11-22 CRAN (R 3.3.3)
 labeling     0.3       2014-08-23 CRAN (R 3.3.2)
 lattice      0.20-34   2016-09-06 CRAN (R 3.3.3)
 lazyeval     0.2.0     2016-06-12 CRAN (R 3.3.3)
 magrittr     1.5       2014-11-22 CRAN (R 3.3.3)
 Matrix       1.2-8     2017-01-20 CRAN (R 3.3.3)
 memoise      1.0.0     2016-01-29 CRAN (R 3.3.3)
 mnormt       1.5-5     2016-10-15 CRAN (R 3.3.2)
 munsell      0.4.3     2016-02-13 CRAN (R 3.3.3)
 nlme         3.1-131   2017-02-06 CRAN (R 3.3.3)
 plyr         1.8.4     2016-06-08 CRAN (R 3.3.3)
 psych        1.7.3.21  2017-03-22 CRAN (R 3.3.3)
 R6           2.2.0     2016-10-05 CRAN (R 3.3.3)
 Rcpp         0.12.10   2017-03-19 CRAN (R 3.3.3)
 reshape2     1.4.2     2016-10-22 CRAN (R 3.3.3)
 rmarkdown    1.4       2017-03-24 CRAN (R 3.3.3)
 rprojroot    1.2       2017-01-16 CRAN (R 3.3.3)
 scales       0.4.1     2016-11-09 CRAN (R 3.3.3)
 stringi      1.1.5     2017-04-07 CRAN (R 3.3.3)
 stringr      1.2.0     2017-02-18 CRAN (R 3.3.3)
 survival   * 2.40-1    2016-10-30 CRAN (R 3.3.3)
 survminer  * 0.3.1.999 2017-04-15 local         
 survMisc     0.5.4     2016-11-23 CRAN (R 3.3.3)
 tibble       1.3.0     2017-04-01 CRAN (R 3.3.3)
 tidyr        0.6.1     2017-01-10 CRAN (R 3.3.3)
 withr        1.0.2     2016-06-20 CRAN (R 3.3.3)
 xtable       1.8-2     2016-02-05 CRAN (R 3.3.3)
 yaml         2.1.14    2016-11-12 CRAN (R 3.3.3)
 zoo          1.7-14    2016-12-16 CRAN (R 3.3.3)
kassambara commented 7 years ago

When fit = survfit(res.cox, newdata), the output of summary(fit) looks like this:

capture d ecran 2017-06-19 a 10 58 59

From this output, we have only:

This is why the risktable from the ggsurvplot of an object of survfit.cox contains only one row corresponding to the whole data set.

It seems that the risk table for groups are available only from the output of survfit.formula() (KM).

MarcinKosinski commented 7 years ago

: /

kassambara commented 7 years ago

If users want to add grouped risk tables under cox adjusted survival curves, the following hack might be helpful:

require("survival")
require("survminer")
km.fit <- survfit(Surv(time, status) ~ sex, data = lung)
km.surv <- ggsurvplot(km.fit, risk.table = TRUE)
km.surv
res.cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data =  lung)
sex_df <- with(lung,
               data.frame(sex = c(1, 2), 
                          age = rep(mean(age, na.rm = TRUE), 2),
                          ph.ecog = c(1, 1)
               )
)
sex_df
fit <- survfit(res.cox, newdata = sex_df)

cox.surv <- ggsurvplot(fit, conf.int = TRUE)
cox.surv
cox.surv$table <- km.surv$table
print(cox.surv, risk.table.height = 0.3)

rplot

Let me know, if you have another suggestion...

kassambara commented 7 years ago

Two suggestions:

pbiecek commented 7 years ago

I am planning to add risktables to ggcoxadjustedcurves() after finishing #229 If you see some easy fix for that, will be very welcomed

kassambara commented 7 years ago

Yes, i'll update ggcoxadjustedcurves() and ggsurvplot() to handle this situation.

pbiecek commented 7 years ago

@kassambara great, but please wait with ggcoxadjustedcurves() are they are being updated right now

kassambara commented 7 years ago

ok, thanks.

bounlu commented 6 years ago

I have the same issue. Moreover it also results in error when pval = T as it recognizes the whole strata as 1 group although it plots the multiple groups correctly:

3: In .pvalue(fit, data = data, method = method, pval = pval, pval.coord = pval.coord,  :
  There are no survival curves to be compared. 
 This is a null model.
mzcs7 commented 5 years ago

@bounlu Hi,have you fixed this problem? I