therneau / survival

Survival package for R
394 stars 106 forks source link

Penalised Cox regression implementation #109

Closed lucharo closed 3 years ago

lucharo commented 4 years ago

TL;DR

I would be up for taking on the challenge of enabling Cox regularisation and submitting a pull request for this application, though I may need some help with interpretation of the C code(I have experience it’s C and C++ but I would need someone to tell me where to look at to implement regularisation).

Long version

In one of my latest project, I wanted to use Cox regression with regularisation penalties (LASSO, Ridge or Elastic Net). In my particular application, I wanted to model survival given a start and a stop vector inside the Surv object, because of this I could not use the cv.glmnet function from the glmnet package (which only accepts Surv objects with a single time vector). Unfortunately, I could not add in a regularisation penalty with the survival package and I had to resort to the penalized package. This package allowed me to add most of the functionality from the survival package but its implementation was sometimes different, e.g. when modelling effect of categorical variables and interactions with time, the estimates would differ between the penalized and the survival, coxph() implementations.

In the end, I performed cross-validation over different regularisation parameter lambda with the penalized package and then used the coefficients estimated for the optimal choice of lambda to train a model with the coxph function to then evaluate performance with the concordance metric.

This was good enough for my task but very inefficient (I find the coxph to be one of the fastest implementations of Cox regression and definitely the most versatile). The only thing missing for me it's the ability to add regularisation penalties and perhaps a cross-validation function.

I would be up for taking on the challenge of enabling Cox regularisation and submitting a pull request for this application, though I may need some help with interpretation of the C code(I have experience it’s C and C++ but I would need someone to tell me where to look at to implement regularisation).

therneau commented 4 years ago

This is one of those "I'll answer it tomorrow" questions that I then forgot. The synopsis:

  1. I have been completely occupied with adding multi-state capabilities to the survival package.
  2. I think that penalized modeling is important in survival, and even more important in the multi-state case.
  3. The right place to do this is in the coxme package, which is long overdue for an update.
  4. I would be delighted to have a working group for that. a. realize that this could be a lifetime commitment -- there is a lot to do, and a lot of interested users b. I have thought hard about how to approach the issue, but simply don't have enough time.

Let's have a conversation. therneau@mayo.edu

therneau commented 4 years ago

A few more details.
Penalized models often have a lot of variables, in fact 100s of variables is one of the common motivations. Any code for general penalized should expect this case. It is also true that the design matrix X will be normally be sparse. This means that sparse matrix computation should be built in from the start, it is critical for decent speed.

For a penalized Cox model, there are efficient ways to get the linear predictor X beta, the log partial likelihood (LPL), and the first derivative of the LPL using sparse methods. But not the second derivative, that is a bear.

The current code has a subtle method for fast second derivatives, but it only works for one particular case (a random intercept per group). That leads to somewhat schizophrenic code: for some problems coxme is blazingly fast, for most it is deathly slow. The trick will not extend to complex models.

One promising path forward is to replace Newton-Raphson iteration is a method that only needs first derivatives. My vote would be Hamiltonian MCMC. Cox models have a very quadratic loglik function and I think it will work very well. Where can we borrow/steal a well worked out Hamiltonian library? Contrary to popular belief, you don't need Baysian priors for Hamiltonian MCMC.

lucharo commented 4 years ago

Thank you for your message @therneau! I'll be in touch through e-mail

jwallib commented 3 years ago

On this topic, may I ask why there is an implementation of ridge regularization, but not Lasso? I have tried modifying the ridge function by swapping out (similar for the case with scale)

list(penalty= sum(coef^2)*theta/2,
             first  = theta*coef,
             second = theta,
             flag=FALSE)

for

sign_coef = sign(coef)
sign_coef[sign_coef == 0] = 1
delat_coef = 1e10 * as.numeric(coef == 0)
list(penalty= sum(abs(coef))*theta,
             first  = theta*sign_coef,
             second = delat_coef ,
             flag=FALSE)

but this seems to not push some of the coefficients towards 0, especially not as effectively as glmnet package. I am however limited when using the glmnet package as I would like to cluster individuals, as is possible in the survival package.

therneau commented 3 years ago

The code for penalized models in coxph depends completely on Newton-Raphson maximization; and that approach simply does not work for the lasso. Again, coxme is the place to add this, but I have another year's iteration ahead on the multi-state material in coxph. Only then will I again turn my attention to penalized code.