therneau / survival

Survival package for R
390 stars 106 forks source link

anova.cph in a nested ANOVA (non-sequential) manner? #215

Closed dankirk95 closed 1 year ago

dankirk95 commented 2 years ago

I tried using anova.coxph from survival to investigate which variables are important to my Cox model outcome, however the default method is sequential testing, whereas I require non-sequential (e.g., like a two-model ANOVA or something like drop1()).

Is there any possibility to do this? Using survival's anova implementation is appealing because I have strata terms that other anovas don't seem to be able to handle and I use psplines() for splines, ruling out anova.rms

I also thought about calculating the relevant statistics manually (something like this answer suggests). This is in theory fine but will take a little while to run through all my 50 variables and doesn't seem very elegant.

therneau commented 2 years ago

You have to give more information. I don't know what a "two-model ANOVA" is, nor what you mean by "I require non-sequential".

dankirk95 commented 2 years ago

Sorry Professor, hopefully I can explain myself a bit better here. I think the answer linked here get to the core of my question perfectly.

survival's anova.coxph "gives a sequential analysis of deviance table....each term of the formula is added in turn are given in as the rows of a table". This is sequential, but I want I want is a "non-squential version" of this anova, i.e., I want to compare a full model with a reduced model (like this for each of my variables so that I can see which are important (in terms of effect size, estimated by likelihood or chisq, and significance, which I will calculate in a subsequent step) for my survival model.

As I state in the original post, I can't seem to use other anova implementations because I have strata. anova.coxph can handle strata but is sequential, and therefore not suitable for my goals.

Any help is greatly appreciated, Professor.

therneau commented 2 years ago

One reason I challenged is that the notion of "each variable adjusting for all the other" is a slippery one. You should have a clear grasp of exactly what you want, statistically, before heading down the path. This has been hampered greatly by SAS with their type I, type II, type III nomeclature. (I spent a lot of time understanding that the SAS type III approach is a complete waste.) There is a vignette "Population Contrasts" in the survival package which discusses some of this; it needs to be updated to reflect that the causal modelers have rediscovered this territory and applied new names (g-estimate). The anova.cph function does something different yet which I have not completely got my arms around.
You really do need to know what your target is.

dankirk95 commented 2 years ago

Thanks for the response Professor. I hvae a clear vision of what I want to do in my head, I just sometimes lack the statistical knowledge to know if what I am doing does in fact align with what I want to do. What I want to do is ask:

"If I look at my full model (which has my main treatment of interest and all these other covs) and look at the full model minus Xj (a given covariate), is there a statistical and meaningful difference between these two models that tells me that this covariate is actually important for predicting my outcome"

I then want to repeat that for all of the covs.

I have settled in the time that has elapsed for manually doing this via likelihood ratio tests for each variable and then collecting the relevant information (chi-sq, df, p values) and then putting this in a table and presenting that. My reading tells me this is okay to do, I just guessed there'd be an existing way to do it so that I didn't have to do it manually.

therneau commented 1 year ago

The issue is what to do with interactions, a subject with a contentious history. My opinions are encoded in the yates() function.