kassambara / survminer

Survival Analysis and Visualization
https://rpkgs.datanovia.com/survminer/
506 stars 162 forks source link

ggcoxfunctional : functional form of continuous variable in cox model #14

Closed MarcinKosinski closed 8 years ago

MarcinKosinski commented 8 years ago

The survival analysis is still a dynamic area of research which in many cases require complex pre-analysis of assumptions. One assumption of the most popular survival analysis model, the cox proportional hazards model, states that the continuous variable need to be in a linear correlation with the observed times of events. This is know as looking for the functional form of continuous variable in cox model.

This assumption can be checked visually, and if this is not stated then the below graphs might propose what should be the transformation of continuous variable in the model

library(archivist)
aread('MarcinKosinski/Museum/0c289d101131d74ae31429c') -> exampleSurvData
library(survival)
plot(exampleSurvData$size,
     resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)))
lines(lowess(exampleSurvData$size,
             resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)),
             iter = 0,
             f = 0.6))

cox_pre

One might plot the martingale residuals of the null model vs continuous explanatory variables to see that the logarithm could be applied to this explanatory variable to provide a more linear form:

plot(log(exampleSurvData$size),
     resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)))
lines(lowess(log(exampleSurvData$size),
             resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)),
             iter = 0,
             f = 0.6))

cox_post

In my opinion this is a crucial tool in cox model but the proposed plot is ugly and not publication-ready. I think that recreating such plot in a ggplot is easy to obtain

library(ggplot2)

exampleSurvDataMunged <- data.frame(log_size = log(exampleSurvData$size),
                                    martingale_resid = resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)),
                                    log_size_lowess = lowess(log(exampleSurvData$size),
                                                             resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)),
                                                             iter = 0,
                                                             f = 0.6)$x,
                                    martingale_resid_lowess = lowess(log(exampleSurvData$size),
                                                                     resid(coxph(Surv(rectime,censrec)~1, data = exampleSurvData)),
                                                                     iter = 0,
                                                                     f = 0.6)$y)
library(ggthemes)
ggplot(exampleSurvDataMunged, 
       aes(x = log_size, 
           y = martingale_resid),
       col = log_size > martingale_resid) +
   geom_point(alpha = 0.5) + 
   geom_line(aes(log_size_lowess, martingale_resid_lowess)) +
   theme_igray(base_family = "mono", base_size = 14) 

cox_ggplot

And the whole analysis became more professional and elegant. What would you think about adding such functionality to the survminer package as en extra function ggcoxfunctionalplot (long name, but couldn't come with better one) ?

kassambara commented 8 years ago

I'll work on that:-)!

MarcinKosinski commented 8 years ago

You could work on adding other tests than log-rank and I'll could provide PR for this function. What would you say about the name ggcoxfunctional ?

kassambara commented 8 years ago

Ok no problem! I'll work on adding other tests and the name ggcoxfunctional is fine for me:)!

MarcinKosinski commented 8 years ago

moved to #23

MarcinKosinski commented 8 years ago

Few TODOS:

kassambara commented 8 years ago

Thanks Marcin:-)!

For 2): May be, playing with the functions all.vars(formula) and terms(formula), could help...

library(survival)

# Fit cox
data(mgus)
formula <- Surv(futime, death) ~ mspike+log(mspike) + I(mspike)^2
SurvFormula <- deparse(formula[[2]])
covariate_terms <- attr(terms(formula), "term.labels")
for( i in covariate_terms){
  cox.model <- coxph(as.formula(paste0(SurvFormula, " ~ ", i)), data = mgus)
  print(summary(cox.model))
}

# formula2 : heart data in survival package
formula2 <- Surv(start, stop, event) ~ (age + surgery)* transplant
all.vars(formula2)
MarcinKosinski commented 8 years ago

Thanks for suggestion :) I'll Have a look at this. It will be a great lesson for me.

Marcin Kosinski

Dnia 02.03.2016 o godz. 11:59 Alboukadel KASSAMBARA notifications@github.com napisał(a):

Thanks Marcin:-)!

For 2): May be, playing with the functions all.vars(formula) and terms(formula), could help...

library(survival)

Fit cox

data(mgus) formula <- Surv(futime, death) ~ mspike+log(mspike) + I(mspike)^2 SurvFormula <- deparse(formula[[2]]) covariate_terms <- attr(terms(formula), "term.labels") for( i in covariate_terms){ cox.model <- coxph(as.formula(paste0(SurvFormula, " ~ ", i)), data = mgus) print(summary(cox.model)) }

formula2 : heart data in survival package

formula2 <- Surv(start, stop, event) ~ (age + surgery)* transplant all.vars(formula2) — Reply to this email directly or view it on GitHub.

MarcinKosinski commented 8 years ago

attr(., "term.labels") has the same problem as model.frame function that I have described here http://stackoverflow.com/questions/36155209/model-frame-function-does-not-react-on-asis-object-in-r. The same moment I'll learn how to fix this, I'll come with the PR that resolves mentioned TODOS.

MarcinKosinski commented 8 years ago

Ok I got it. It should be I(mspike^2) instead of I(mspike)^2 :) and the rest works

kassambara commented 8 years ago

Ah ok!! cool:-)!

MarcinKosinski commented 8 years ago

@kassambara I've provided PR #29 with full handling for formula tranformations of variables and I've added an example. Thanks for ideas and suggestions!

Do you think we could somehow change print.ggcoxfunctional (using this http://stackoverflow.com/questions/11076567/plot-a-legend-and-well-spaced-universal-y-axis-and-main-titles-in-grid-arrange) to have only 1 y-axis title (not 1 per plot in grid) and 1 main title? So that ggcoxfunctional object would have an attribute y-title and then print.ggcoxfunctional could use this attribute to provide an y-title for whole grid?

library(survival)
data(mgus)
library(ggplot2)
ggcoxfunctional(Surv(futime, death) ~ mspike + log(mspike) + I(mspike^2) + 
                  age + I(log(age)^2) + I(sqrt(age)), data = mgus,
                point.col = "white", point.alpha = 0.5, ggtheme = theme_dark())

ggcoxfunctional2

kassambara commented 8 years ago

Great job man:-)! Thank you for your work! Having only one y-title would simplify these elegant plots, so it's a good option!

MarcinKosinski commented 8 years ago

Add left y-axis and main title like here : https://github.com/kassambara/survminer/issues/13#issuecomment-200507691

MarcinKosinski commented 8 years ago

I've created PR for this https://github.com/kassambara/survminer/pull/32