kassambara / survminer

Survival Analysis and Visualization
https://rpkgs.datanovia.com/survminer/
495 stars 160 forks source link

[Proposition] Cox proportional hazards assumption test based on Schoenfeld Residuals #13

Closed MarcinKosinski closed 8 years ago

MarcinKosinski commented 8 years ago

So far survminer provides a great tool to display the p-value of the log-rank test for plotted Kaplan-Meier estimates of the survival curves divided on strata.

In many cases the next step in survival analysis, after plotting the survival curves, is fitting the Cox proportional hazards model. I have seen many research posters where biologists shows plots of survival curves that intersect but the p-value of log-rank test allows for the rejection of the null hypothesis (but of the log-rank TEST!) so researchers proceed with Cox model.

Maybe if survminer would support also Cox proportional hazards assumption test that is based on Schoenfeld Residuals, then one could also be aware of the p-value of the assumption test for Cox model https://stat.ethz.ch/R-manual/R-devel/library/survival/html/cox.zph.html.

If thats suits you then pval parameter could be changed to pval.logrank and the new parameter pval.cox.zph could be added and both p-values could be printed somehow. If that suits you then I could prepare a PR.

I know that in most cases survival curves are plotted separately for each variable and the Schoenfeld test should be done for all variables together but this functionality could be useful if one would fit the cox model with 1 variable.

Other version of this proposition

Prepare ggsurvplot.cox.zph function (or it's overload for this class of object) that would use grid.arrange to plot many survival curves for different variables (one above another, with 1 risk table at the bottom) and would add a p-value of each variable of the Schoenfeld test to the correspoding survival plot and would add global p-value of the test to tne bottow of the graph. This functionality could be great for models with 2-10 variables, and the output could fit on 1 A4 page of article.

What do you think about that? Would you like such functionality and maybe a sweet PR about it?

kassambara commented 8 years ago

The future of survminer development is to add more functionalities related to survival analysis.

Adding a support for cox analysis is an excellent proposition and it would be very useful for the scientific community.

I would really appreciate if you can provide a PR about ggsurvplot.cox.zph.

For the pval parameter, I would suggest: pval = TRUE, pval.method = c("logrank", "cox.zph", "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "modified-Peto-Peto", "Flem-Harr").

Thank you in advance, A. Kassambara

MarcinKosinski commented 8 years ago

Great. Wish me luck then :)

2016-02-23 8:08 GMT+01:00 Alboukadel KASSAMBARA notifications@github.com:

The future of survminer development is to add more functionalities related to survival analysis.

Adding a support for cox analysis is an excellent proposition and it would be very useful for the scientific community.

I would really appreciate if you can provide a PR about ggsurvplot.cox.zph.

For the pval parameter, I would suggest: pval = TRUE, pval.method = c("logrank", "cox.zph", "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "modified-Peto-Peto", "Flem-Harr").

Thank you in advance, A. Kassambara

— Reply to this email directly or view it on GitHub https://github.com/kassambara/survminer/issues/13#issuecomment-187577038 .

MarcinKosinski commented 8 years ago

Based on the code of survival:::plot.cox.zph() I have prepared a ggplot-like equivalent of this function.

plotMe <- function (x, resid = TRUE, se = TRUE, df = 4, nsmo = 40, var){
  xx <- x$x
  yy <- x$y
  d <- nrow(yy)
  df <- max(df)
  nvar <- ncol(yy)
  pred.x <- seq(from = min(xx), to = max(xx), length = nsmo)
  temp <- c(pred.x, xx)
  lmat <- splines::ns(temp, df = df, intercept = TRUE)
  pmat <- lmat[1:nsmo, ]
  xmat <- lmat[-(1:nsmo), ]
  qmat <- qr(xmat)
  if (qmat$rank < df) 
    stop("Spline fit is singular, try a smaller degrees of freedom")
  if (se) {
    bk <- backsolve(qmat$qr[1:df, 1:df], diag(df))
    xtx <- bk %*% t(bk)
    seval <- d * ((pmat %*% xtx) * pmat) %*% rep(1, df)
  }
  ylab <- paste("Beta(t) for", dimnames(yy)[[2]])
  if (missing(var)) 
    var <- 1:nvar
  else {
    if (is.character(var)) 
      var <- match(var, dimnames(yy)[[2]])
    if (any(is.na(var)) || max(var) > nvar || min(var) < 
          1) 
      stop("Invalid variable requested")
  }
  if (x$transform == "log") {
    xx <- exp(xx)
    pred.x <- exp(pred.x)
  }
  else if (x$transform != "identity") {
    xtime <- as.numeric(dimnames(yy)[[1]])
    indx <- !duplicated(xx)
    apr1 <- approx(xx[indx], xtime[indx], seq(min(xx), max(xx), 
                                              length = 17)[2 * (1:8)])
    temp <- signif(apr1$y, 2)
    apr2 <- approx(xtime[indx], xx[indx], temp)
    xaxisval <- apr2$y
    xaxislab <- rep("", 8)
    for (i in 1:8) xaxislab[i] <- format(temp[i])
  }
  plots <- list()
  lapply(var, function(i) {
    invisible(round(x$table[i, 3],4) -> pval)
    ggplot() + ggtitle(paste0('Schoenfeld Individual Test p: ', pval)) -> gplot
    y <- yy[, i]
    yhat <- pmat %*% qr.coef(qmat, y)
    if (resid) 
      yr <- range(yhat, y)
    else yr <- range(yhat)
    if (se) {
      temp <- 2 * sqrt(x$var[i, i] * seval)
      yup <- yhat + temp
      ylow <- yhat - temp
      yr <- range(yr, yup, ylow)
    }
    if (x$transform == "identity") {
      gplot + geom_line(aes(x=pred.x, y=yhat)) +
        xlab("Time") +
        ylab(ylab[i]) +
        ylim(yr) -> gplot
    } else if (x$transform == "log") {
      gplot + geom_line(aes(x=log(pred.x), y=yhat)) +
        xlab("Time") +
        ylab(ylab[i]) +
        ylim(yr)  -> gplot
    } else {
      gplot + geom_line(aes(x=pred.x, y=yhat)) +
        xlab("Time") +
        ylab(ylab[i]) +
        scale_x_continuous(breaks = xaxisval,
                           labels = xaxislab) +
        ylim(yr)-> gplot
    }

    if (resid) 
      gplot <- gplot + geom_point(aes(x = xx, y =y), col = "red")

    if (se) {
      gplot <- gplot + geom_line(aes(x=pred.x, y=yup), lty = "dashed") +
        geom_line(aes( x = pred.x, y = ylow), lty = "dashed")
    }

  }) -> plots
  names(plots) <- dimnames(yy)[[2]]
  plots
}

library(survival)
fit <- coxph(Surv(futime, fustat) ~ age + ecog.ps,  
             data=ovarian) 
temp <- cox.zph(fit) 
library(ggplot2)
plotMe(temp) -> plots

library(gridExtra)
grid.arrange(plots[[1]], plots[[2]])

plot_gg

In comparison to regular plot

par(mfrow=c(2,1))
plot(temp)

plot_base

Discussion

I am wondering how this should work?

IMHO it should be an additional function ggcoxzph that would work on coxph object or cox.zph object. What do you think about that? Then the customization could be done like in ggsurvplot .

kassambara commented 8 years ago

I would go for an additional function ggcoxzph !!

MarcinKosinski commented 8 years ago

moved to #22

MarcinKosinski commented 8 years ago

Shall we extend README.md with this new plot? Or we might prepare a gh-page for this repo in jekyll like this one http://statistics.rainandrhino.org/knitr-hyde/ with sub-pages for each new function :)?

MarcinKosinski commented 8 years ago

@kassambara do you think it's possible to somehow add main (with the global shoenfeld p-value) to thegrid.arrange in print.ggcoxzph ?

kassambara commented 8 years ago

Resolved now by adding the following R code in print.ggcoxzph (). gridExtra (>=2.0) required for using the additional "top" argument (see below)--> so I updated the DESCRIPTION file

 main <- paste0("Global Schoenfeld Test p: ", signif(pval, 4), "\n")
  do.call(gridExtra::grid.arrange, c(grobs, top = main))
MarcinKosinski commented 8 years ago

GREAT! :heart_eyes: So I'll upadte print.coxphfunctional with the same solution :)