stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

Implement survival models #2

Open jpiironen opened 7 years ago

jpiironen commented 7 years ago

We need to implement the projection for survival models at some point. The actual implementation needs some thinking and is likely to depend on how the survival models are/will be handled in rstanarm (issue https://github.com/stan-dev/rstanarm/issues/69) and in package survival. But this is mainly related to how this appears to the user. Regarding the projection itself, there should not be too many difficulties provided the likelihood is log-concave.

csetraynor commented 6 years ago

The implementation of survreg shouldn't be difficult as you say the full data likelihood is concave, not a problem. What it's for sure more tricky is the coxph , which is actually the most widely used model in my experience. As far as I know it's needed the poisson trick to translate the coxph to a Bayesian model.

avehtari commented 6 years ago

What it's for sure more tricky is the coxph , which is actually the most widely used model in my experience.

I have not seen coxph being used with a large number of predictors, @csetraynor if you know such papers, please post here or email me

csetraynor commented 6 years ago

Hi @avehtari , well, I agree I have not yet read any paper that uses a Cox Bayesian model. However, there is an excellent work made by @jburos that goes in that direction https://github.com/hammerlab/survivalstan

Besides, the frenquentist literature is rich, of course, just for the sake of exemplification I will post this excellent paper https://bit.ly/2JJibtG ( Bøvelstad, 2007).

avehtari commented 6 years ago

hi @csetraynor

well, I agree I have not yet read any paper that uses a Cox Bayesian model

I wrote "I have not seen coxph being used with a large number of predictor". I have seen it used with a small number of predictors. See, e.g.,

With a large number of predictors, it's likely that there is not enough information to learn more elaborate conditional distributions than parametric models. See ,e.g.,

I think we should have both types of survival models in some Stan survival model package, but for projpred dealing with a large number of predictors, the priority is in parametric survival models (with censoring).

csetraynor commented 5 years ago

Hi, I would like to say sorry for my previous comment in this thread: it was a while ago and I really did not undertand at the time what this package was doing (though still now I have some troubles to completely understand the theory, is the projection of the posterior, right?). Really looking forward for the implementation of mixed models and how that will play out with stan_jm. Really would like to collaborate for stan_surv and stan_jm but I am struggling understanding the theory.

csetraynor commented 5 years ago

Ok, after re-reading your paper I think I understand the bit of maximising the log-posterior-likelihood, am I correct? Would then be a good start to plug in the log-likelihood for a survival model as expressed by Sam Brilleman in the survival-branch of rstanarm? https://github.com/stan-dev/rstanarm/blob/feature/survival/R/log_lik.R In the simplest setting of right censored data this amounts to:

if (status == 1) { 
      # uncensored
      args$times <- data_i$t_end
      lhaz  <- do.call(evaluate_log_basehaz,  args) + eta
      lsurv <- do.call(evaluate_log_basesurv, args) * exp(eta)
      ll <- lhaz + lsurv
    } else if (status == 0) { 
      # right censored
      args$times <- data_i$t_end
      lsurv <- do.call(evaluate_log_basesurv, args) * exp(eta)
      ll <- lsurv
    } 

I do have a (classical) technical question, is then OK to use the censoring and observed times in the observed data for projection predictive varsel? I mean in plain words the survival model does not predict the exact time of the event happening but just a probability of happening.

csetraynor commented 4 years ago

Hello, I have been looking into this more recently. First, for most common parametric models i.e. Weibull, exponential, Gomeprtz, computing pseudo-observations is an easy task as one can rely on the extreme value distribution to predict the event times. No censoring in pseudo-observation could be assumed, although this may be a slip forward in some situations such as competing risk etc. it should apply for classical survival analysis. Ok then, my question is what is your recommendation it is easy to integrate into the current structure of the package: a Newton-Raphson routine, or possibly a polynomial approximation.

avehtari commented 4 years ago

Hi @csetraynor. I'm sorry I missed your 5 July post. We are right now refactoring projpred and it is better to work with develop branch, but there can be breaking chances so it may be good also to wait a few weeks. I need to talk with @paul-buerkner and @AlejandroCatalina about the best approach to proceed this and I'll get back in few days.

csetraynor commented 4 years ago

Hi Aki @avehtari I moved my question to Discourse as I though maybe it fits more. https://discourse.mc-stan.org/t/using-projpred-for-survival-and-bespoke-models/13655

csetraynor commented 4 years ago

Hi, I wanted to add to this thread and to the discussion we had in the forums a while ago, that finally, I have checked out the idea of projecting the latent factors for survival models and works well. I have added a few hacks to use a 'latent_factor_dev' parameter that projects the latent factors using the fast gaussian projection. It is very convenient, and my first test with simulated data check out. I am adding further tests before submitting a PR. But the work in progress I am doing is currently in this branch https://github.com/csetraynor/projpred/tree/survival-latent-feature. Thanks for the tip @avehtari !

csetraynor commented 4 years ago

Edit to my previous post: Small caveats are that I am preparing it only for a user case of an stan_surv object fitted with rstanarm. So the methods are subject to changes in rstanarm/survival-branch.

AlejandroCatalina commented 4 years ago

Hi @csetraynor and thank you for your interest and work in projpred.

We are currently making significant changes to how projpred works (you can take a look at develop branch) so it might be better for you to wait a couple weeks before posting a PR, because it's quite likely that some of the changes affects how your code works at the most inner level. You can of course ask us for some help in order to refactor the code you've already written when we release the new version!

csetraynor commented 4 years ago

Hi @AlejandroCatalina thanks for the good work on the alpha release. I am happy to collaborate. I am doing comparisons of varsel for survival as part of my thesis. In the develop survival models are still not put forward right? The latent factor approach worked quite fine in synthetic data. Sorry for the late reply I did not see the comment I was OOO.

AlejandroCatalina commented 4 years ago

We are currently studying the latent function approach and it seems very promising for any general model, so it's very likely that we push in that direction. Given that we have some tests already we might have something relatively soon. In any case you are correct in that we haven't pushed survival models in particular so far, but we have several open lines including survival and ordinal models for instance.

avehtari commented 2 years ago

latent_projection branch has now an example of survival model with censored time-to-event data https://github.com/stan-dev/projpred/blob/latent_projection/vignettes/latent.Rmd, as soon as that is merged we can close this issue