Closed MarcinKosinski closed 8 years ago
The future of survminer development is to add more functionalities related to survival analysis.
Adding a support for cox analysis is an excellent proposition and it would be very useful for the scientific community.
I would really appreciate if you can provide a PR about ggsurvplot.cox.zph.
For the pval parameter, I would suggest: pval = TRUE, pval.method = c("logrank", "cox.zph", "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "modified-Peto-Peto", "Flem-Harr").
Thank you in advance, A. Kassambara
Great. Wish me luck then :)
2016-02-23 8:08 GMT+01:00 Alboukadel KASSAMBARA notifications@github.com:
The future of survminer development is to add more functionalities related to survival analysis.
Adding a support for cox analysis is an excellent proposition and it would be very useful for the scientific community.
I would really appreciate if you can provide a PR about ggsurvplot.cox.zph.
For the pval parameter, I would suggest: pval = TRUE, pval.method = c("logrank", "cox.zph", "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "modified-Peto-Peto", "Flem-Harr").
Thank you in advance, A. Kassambara
— Reply to this email directly or view it on GitHub https://github.com/kassambara/survminer/issues/13#issuecomment-187577038 .
Based on the code of survival:::plot.cox.zph()
I have prepared a ggplot
-like equivalent of this function.
plotMe <- function (x, resid = TRUE, se = TRUE, df = 4, nsmo = 40, var){
xx <- x$x
yy <- x$y
d <- nrow(yy)
df <- max(df)
nvar <- ncol(yy)
pred.x <- seq(from = min(xx), to = max(xx), length = nsmo)
temp <- c(pred.x, xx)
lmat <- splines::ns(temp, df = df, intercept = TRUE)
pmat <- lmat[1:nsmo, ]
xmat <- lmat[-(1:nsmo), ]
qmat <- qr(xmat)
if (qmat$rank < df)
stop("Spline fit is singular, try a smaller degrees of freedom")
if (se) {
bk <- backsolve(qmat$qr[1:df, 1:df], diag(df))
xtx <- bk %*% t(bk)
seval <- d * ((pmat %*% xtx) * pmat) %*% rep(1, df)
}
ylab <- paste("Beta(t) for", dimnames(yy)[[2]])
if (missing(var))
var <- 1:nvar
else {
if (is.character(var))
var <- match(var, dimnames(yy)[[2]])
if (any(is.na(var)) || max(var) > nvar || min(var) <
1)
stop("Invalid variable requested")
}
if (x$transform == "log") {
xx <- exp(xx)
pred.x <- exp(pred.x)
}
else if (x$transform != "identity") {
xtime <- as.numeric(dimnames(yy)[[1]])
indx <- !duplicated(xx)
apr1 <- approx(xx[indx], xtime[indx], seq(min(xx), max(xx),
length = 17)[2 * (1:8)])
temp <- signif(apr1$y, 2)
apr2 <- approx(xtime[indx], xx[indx], temp)
xaxisval <- apr2$y
xaxislab <- rep("", 8)
for (i in 1:8) xaxislab[i] <- format(temp[i])
}
plots <- list()
lapply(var, function(i) {
invisible(round(x$table[i, 3],4) -> pval)
ggplot() + ggtitle(paste0('Schoenfeld Individual Test p: ', pval)) -> gplot
y <- yy[, i]
yhat <- pmat %*% qr.coef(qmat, y)
if (resid)
yr <- range(yhat, y)
else yr <- range(yhat)
if (se) {
temp <- 2 * sqrt(x$var[i, i] * seval)
yup <- yhat + temp
ylow <- yhat - temp
yr <- range(yr, yup, ylow)
}
if (x$transform == "identity") {
gplot + geom_line(aes(x=pred.x, y=yhat)) +
xlab("Time") +
ylab(ylab[i]) +
ylim(yr) -> gplot
} else if (x$transform == "log") {
gplot + geom_line(aes(x=log(pred.x), y=yhat)) +
xlab("Time") +
ylab(ylab[i]) +
ylim(yr) -> gplot
} else {
gplot + geom_line(aes(x=pred.x, y=yhat)) +
xlab("Time") +
ylab(ylab[i]) +
scale_x_continuous(breaks = xaxisval,
labels = xaxislab) +
ylim(yr)-> gplot
}
if (resid)
gplot <- gplot + geom_point(aes(x = xx, y =y), col = "red")
if (se) {
gplot <- gplot + geom_line(aes(x=pred.x, y=yup), lty = "dashed") +
geom_line(aes( x = pred.x, y = ylow), lty = "dashed")
}
}) -> plots
names(plots) <- dimnames(yy)[[2]]
plots
}
library(survival)
fit <- coxph(Surv(futime, fustat) ~ age + ecog.ps,
data=ovarian)
temp <- cox.zph(fit)
library(ggplot2)
plotMe(temp) -> plots
library(gridExtra)
grid.arrange(plots[[1]], plots[[2]])
In comparison to regular plot
par(mfrow=c(2,1))
plot(temp)
I am wondering how this should work?
survminer
have additional function ?survminer::ggsurvplot
have additional method ?survminer::ggsurvplot
have additional parameter cox.zph = TRUE
that would trigger different plot ?IMHO it should be an additional function ggcoxzph
that would work on coxph
object or cox.zph
object. What do you think about that? Then the customization could be done like in ggsurvplot
.
I would go for an additional function ggcoxzph
!!
moved to #22
Shall we extend README.md
with this new plot?
Or we might prepare a gh-page for this repo in jekyll like this one
http://statistics.rainandrhino.org/knitr-hyde/ with sub-pages for each new function :)?
@kassambara do you think it's possible to somehow add main
(with the global shoenfeld p-value) to thegrid.arrange
in print.ggcoxzph
?
Resolved now by adding the following R code in print.ggcoxzph (). gridExtra (>=2.0) required for using the additional "top" argument (see below)--> so I updated the DESCRIPTION file
main <- paste0("Global Schoenfeld Test p: ", signif(pval, 4), "\n")
do.call(gridExtra::grid.arrange, c(grobs, top = main))
GREAT! :heart_eyes:
So I'll upadte print.coxphfunctional
with the same solution :)
So far
survminer
provides a great tool to display the p-value of the log-rank test for plotted Kaplan-Meier estimates of the survival curves divided on strata.In many cases the next step in survival analysis, after plotting the survival curves, is fitting the Cox proportional hazards model. I have seen many research posters where biologists shows plots of survival curves that intersect but the p-value of log-rank test allows for the rejection of the null hypothesis (but of the log-rank TEST!) so researchers proceed with Cox model.
Maybe if survminer would support also Cox proportional hazards assumption test that is based on Schoenfeld Residuals, then one could also be aware of the p-value of the assumption test for Cox model https://stat.ethz.ch/R-manual/R-devel/library/survival/html/cox.zph.html.
If thats suits you then
pval
parameter could be changed topval.logrank
and the new parameterpval.cox.zph
could be added and both p-values could be printed somehow. If that suits you then I could prepare a PR.I know that in most cases survival curves are plotted separately for each variable and the Schoenfeld test should be done for all variables together but this functionality could be useful if one would fit the cox model with 1 variable.
Other version of this proposition
Prepare
ggsurvplot.cox.zph
function (or it's overload for this class of object) that would usegrid.arrange
to plot many survival curves for different variables (one above another, with 1 risk table at the bottom) and would add a p-value of each variable of the Schoenfeld test to the correspoding survival plot and would add global p-value of the test to tne bottow of the graph. This functionality could be great for models with 2-10 variables, and the output could fit on 1 A4 page of article.What do you think about that? Would you like such functionality and maybe a sweet PR about it?