stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
379 stars 134 forks source link

Survival models #280

Open sambrilleman opened 6 years ago

sambrilleman commented 6 years ago

I'm opening this issue so there is a forum in which to discuss progress on getting survival models into rstanarm.

I've started a new survival branch here. It currently has a stan_surv modelling function, calling a surv.stan model file. The modelling function returns an object of class stansurv that inherits stanreg.

The current task list looks something like:

ermeel86 commented 6 years ago

Thanks @sambrilleman for setting this up. At this stage I am trying to make sense of your code / approach. So some questions follow every now and then. Once this is clarified I will move on an try to contribute :)

You mention that for the ms approach

A closed form solution is available for both the hazard and cumulative hazard so this approach should be relatively fast.

as well as

The default locations for the internal knots, as well as the basis terms for the splines, are calculated with respect to log time.

I see how one can get a closed form solution if the splines work directly on the time scale, but if you use the log time scale wouldn't you need to evaluate an integral with an integrand that is the product of a weighted sum of each of the M-splines and exp(integration-variable), with lower integration bound being -\infty and upper one being \log{t}. I am not aware of a solution like this for M-splines... Maybe I am overseeing something here

jgabry commented 6 years ago

Glad you both are working on this!

sambrilleman commented 6 years ago

Thanks @sambrilleman for setting this up. At this stage I am trying to make sense of your code / approach. So some questions follow every now and then. Once this is clarified I will move on an try to contribute :)

@ermeel86 Sounds good!

I see how one can get a closed form solution if the splines work directly on the time scale, but if you use the log time scale wouldn't you need to evaluate an integral with an integrand that is the product of a weighted sum of each of the M-splines and exp(integration-variable), with lower integration bound being -\infty and upper one being \log{t}.

Hmm, maybe you are right (and it is safer to just stick with t, rather than log(t)). But I had the impression (without writing down the math) that transforming the time scale would be somewhat irrelevant from the perspective of integrating the basis terms. The I-splines are still just evaluated as the integral of the M-splines, but with respect to whatever transformation of time you choose. So, if you choose log(time) as the time scale for the splines, then you are modelling h(log(t)) using M-splines, and you are modelling H(log(t)) using I-splines...

(But I may well be missing something here...!)

ermeel86 commented 6 years ago
m_i_spline
sambrilleman commented 5 years ago

@bgoodri @jgabry I'm hoping you can maybe help me with some methodology and implementation.

For stan_surv I am formulating a regression model defined on the hazard scale, that allows for either time-fixed or time-varying coefficients. In survival analysis, the time-varying coefficients are often referred to as "non-proportional hazards", since the effect of the covariate on the hazard is allowed to vary over time.

I am formulating the time-varying coefficient (say beta_p(t)) as a B-spline function, as shown in Equation 4 of the attached PDF. But I want to penalise the spline coefficients to avoid overfitting. I see that in stan_gamm you use a hierarchical smoothing prior for the spline coefficients but they are not penalised, rather, the coefficients are treated as independent of one another (i.e. each spline coefficient has a univariate normal prior).

But since I am using B-splines (whereas I think stan_gamm uses thin plate regression splines) I want to introduce a penalty matrix.

Following Lang and Brezger, I think I can do this via the prior distribution defined in Equation 5 of the attached PDF. If I am correct, that prior distribution for the B-spline coefficients basically equates to a multivariate normal distribution with precision matrix equal to the r-th difference penalty matrix scaled by the smoothing SD. The smoothing SD is then given some hyper-prior as chosen by the user (e.g. exponential, half-normal, half-t or half-Cauchy -- i.e. the equivalent of prior_smooth in stan_gamm).

My question is: I am looking for advice on the implementation of the MVN prior for the B-spline coefficients. You can see my current Stan code here. It currently shows a univariate normal prior for each of the B-spline coefficients, and a hyperprior for the smoothing SD (i.e. the same as used in stan_gamm). But, the commented out lines (i.e. here) show my initial attempt at a MVN prior with penalty matrix for the coefficients. But that code is probably wrong, and in any case I want to be able to reparameterise the MVN prior using Matt's trick, but don't know how to do that for a precision-based MVN distribution.

Any chance you have advice/direction/anything to add on this? If not, a more general question might be "is it possible to apply Matt's trick to a MVN distribution parameterised in terms of it's precision matrix?"

Thanks in advance! notes_for_Ben_and_Jonah.pdf

bgoodri commented 5 years ago

The short answer is that with stan_gamm4, the design matrix for the smooth function has been transformed by a SVD so that its columns are orthogonal and hence the independent normal priors on the coefficients are arguably appropriate. This involves the diagonalize = TRUE argument to mgcv::jagam https://github.com/stan-dev/rstanarm/blob/master/R/stan_gamm4.R#L182

On Mon, Oct 8, 2018 at 3:20 AM Sam Brilleman notifications@github.com wrote:

@bgoodri https://github.com/bgoodri @jgabry https://github.com/jgabry I'm hoping you can maybe help me with some methodology and implementation.

For stan_surv I am formulating a regression model defined on the hazard scale, that allows for either time-fixed or time-varying coefficients. In survival analysis, the time-varying coefficients are often referred to as "non-proportional hazards", since the effect of the covariate on the hazard is allowed to vary over time.

I am formulating the time-varying coefficient (say beta_p(t)) as a B-spline function, as shown in Equation 4 of the attached PDF. But I want to penalise the spline coefficients to avoid overfitting. I see that in stan_gamm you use a hierarchical smoothing prior for the spline coefficients but they are not penalised, rather, the coefficients are treated as independent of one another (i.e. each spline coefficient has a univariate normal prior).

But since I am using B-splines (whereas I think stan_gamm uses thin plate regression splines) I want to introduce a penalty matrix.

Following Lang and Brezger https://www.jstor.org/stable/1391151, I think I can do this via the prior distribution defined in Equation 5 of the attached PDF. If I am correct, that prior distribution for the B-spline coefficients basically equates to a multivariate normal distribution with precision matrix equal to the r-th difference penalty matrix scaled by the smoothing SD. The smoothing SD is then given some hyper-prior as chosen by the user (e.g. exponential, half-normal, half-t or half-Cauchy -- i.e. the equivalent of prior_smooth in stan_gamm).

My question is: I am looking for advice on the implementation of the MVN prior for the B-spline coefficients. You can see my current Stan code here https://github.com/stan-dev/rstanarm/blob/feature-interval-censored-jm/src/stan_files/surv.stan#L248. It currently shows a univariate normal prior for each of the B-spline coefficients, and a hyperprior for the smoothing SD (i.e. the same as used in stan_gamm). But, the commented out lines (i.e. here https://github.com/stan-dev/rstanarm/blob/feature-interval-censored-jm/src/stan_files/surv.stan#L251) show my initial attempt at a MVN prior with penalty matrix for the coefficients. But that code is probably wrong, and in any case I want to be able to reparameterise the MVN prior using Matt's trick, but don't know how to do that for a precision-based MVN distribution.

Any chance you have advice/direction/anything to add on this? If not, a more general question might be "is it possible to apply Matt's trick to a MVN distribution parameterised in terms of it's precision matrix?"

Thanks in advance! notes_for_Ben_and_Jonah.pdf https://github.com/stan-dev/rstanarm/files/2454965/notes_for_Ben_and_Jonah.pdf

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-427740798, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqr0zZm_vhBSMQzsoOQHcx9Z-m--bks5uivzUgaJpZM4T7CES .

sambrilleman commented 5 years ago

@bgoodri Thanks, that makes sense.

But just to elaborate, suppose I wish to specify the coefficients with MVN prior with penalty matrix, is it possible to specify a non-centered parameterisation for that?

The Stan manual shows a MVN non-centered reparameterisation as:

data {
  int<lower=2> K;
  vector[K] mu;
  cov_matrix[K] Sigma;
  ...
transformed data {
  matrix[K, K] L;
  L = cholesky_decompose(Sigma);
}
parameters {
  vector[K] alpha;
  ...
transformed parameters {
  vector[K] beta;
  beta = mu + L * alpha;
}
model {
  alpha ~ normal(0, 1);
  // implies: beta ~ multi_normal(mu, Sigma)
  ...

but I think what I need is a non-centered reparameterisation of beta ~ multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma). In theory, is that simple to specify?

bgoodri commented 5 years ago

The stochastic representation of the multivariate normal is in terms of the Cholesky factor of the covariance matrix. I guess you could do something with the inverse of a Cholesky factor of a precision matrix, but that would require that you solve a potentially large triangular system at every leapfrog step.

On Mon, Oct 8, 2018 at 7:53 PM Sam Brilleman notifications@github.com wrote:

@bgoodri https://github.com/bgoodri Thanks, that makes sense.

But just to elaborate, suppose I wish to specify the coefficients with MVN prior with penalty matrix, is it possible to specify a non-centered parameterisation for that?

The Stan manual shows a MVN non-centered reparameterisation as:

data { int K; vector[K] mu; cov_matrix[K] Sigma; ... transformed data { matrix[K, K] L; L = cholesky_decompose(Sigma); } parameters { vector[K] alpha; ... transformed parameters { vector[K] beta; beta = mu + L * alpha; } model { alpha ~ normal(0, 1); // implies: beta ~ multi_normal(mu, Sigma) ...

but I think what I need is a non-centered reparameterisation of beta ~ multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma). In theory, is that simple to specify?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-428014992, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqoAig50xggz5gCgcA4ihVR-Ibo-Jks5ui-VwgaJpZM4T7CES .

csetraynor commented 5 years ago

I'm opening this issue so there is a forum in which to discuss progress on getting survival models into rstanarm.

I've started a new survival branch here. It currently has a stan_surv modelling function, calling a surv.stan model file. The modelling function returns an object of class stansurv that inherits stanreg.

The current task list looks something like:

  • [x] print method for stansurv
  • [ ] summary method for stansurv
  • [x] prior_summary method for stansurv
  • [ ] basehaz_summary method for stansurv
  • [ ] posterior_survfit method for stansurv
  • [ ] log_lik, loo, and waic method for stansurv
  • [ ] plot method that includes plot type for baseline hazard
  • [ ] add option for rcs splines for the log hazard (i.e. instead of B-splines)
  • [ ] get the piecewise constant baseline hazard from stan_jm working with stan_surv
  • [ ] attempt to implement a constrained log-likelihood by just shifting hazard (setting equal to 1E-8 or similar) when the hazard strays negative, see section 2.2 of this paper
  • [ ] add functionality for time-dependent effects (i.e. non-proportional hazards) -- should this use survival package's tt approach, or something else (e.g. a tde() special function for use in the formula)
  • [ ] add AFT models
  • [ ] add semi-parametric model (possibly like this paper)
  • [ ] add frailty models

Hi, could we do a quick update of all the milestones that are currently accomplished? I'd be able to add plot method and posterior_survfit method if they are not ticked yet.

ermeel86 commented 5 years ago

@sambrilleman good idea to use a R.W. priors for the spline coefficients to model the time dependent effects. In general I think we should also offer / implement a R.W. prior for the m-/i-splines in the baseline hazard/cumulative baseline hazard (better a r.w. prior for $\log{\gamma}$, to ensure non-negativity). What do you think?

sambrilleman commented 5 years ago

@ermeel86 Thanks, I would love for your advice on this! I had tried to implement the prior distribution as proposed in Lang and Brezger's "Bayesian P-splines" paper, but got completely confused as to the appropriate Stan implementation. But am I correct in thinking that it is somewhat equivalent to a RW prior? (I've not really used RW priors before)

If a RW prior for the B-spline coefficients is the way to go, then what are your thoughts on this document? In the last section entitled "Location of Knots and the Choice of Priors" Milad discusses smoothing B-splines via a RW prior and the implementation seems to be much more straightforward than what I was attempting when messing around with the Lang and Brezger approach. The code that Milad provides is basically

transformed parameters {
  row_vector[num_basis] a;
  vector[num_data] Y_hat;
  a[1] = a_raw[1];
  for (i in 2:num_basis)
    a[i] = a[i-1] + a_raw[i]*tau;
  Y_hat = a0*to_vector(X) + to_vector(a*B);
}

where the vector of B-spline coefficients is a and the smoothing parameter (sd) is tau. That would be super easy to incorporate into stan_surv. Do you feel that would be an appropriate implementation?

And as you mention, a random walk for $\log{\gamma}$ should also be easy enough.

sambrilleman commented 5 years ago

@csetraynor Thanks, I think I've pretty much sorted a plot.stansurv method for the baseline hazard and the time-dependent hazard ratio, and otherwise the method calls NextMethod (i.e. plot.stanreg) which seems to work for posterior densities, traceplots, etc. And I've gone and ticked off a few other things on that list, including posterior_survfit. Most are working to some degree, but buggy atm.

To make matters worse, I've muddled branches a bit, because I wanted to incorporate interval censoring in both stan_surv and stan_jm, and it seems to be working for the former, but not for the latter. argh. So I'm now working on this branch, which is quite a bit ahead of the survival branch, but has errors for stan_jm. I'll try tidy some of this stuff up and get a clean-ish branch with up to date changes, but I think it might be a bit difficult to work out how we can coordinate working on such aspects at the same time.

ermeel86 commented 5 years ago

I had tried to implement the prior distribution as proposed in Lang and Brezger's "Bayesian P-splines" paper, but got completely confused as to the appropriate Stan implementation. But am I correct in thinking that it is somewhat equivalent to a RW prior? (I've not really used RW priors before)

See this (adapted from "Bayesian Regression modelling with INLA" by Wang et. al):

ar_multivariate_normals

This applies for the lag-1 case. I think that one can recover Eq. 5 from the Lang and Brezger paper by taking the limit $\rho\rightarrow 1$. This would then also explain why they say

the prior (5) is improper.

We could also add a hyper prior for $\rho$ over $[-1,1]$. So from an analytic perspective I think the two approaches are equivalent, computationally I am not sure. Maybe @bgoodri can help out here?

bgoodri commented 5 years ago

Most of the time (i.e. all of the time as far as rstanarm is concerned) you would want to do the multivariate version of this rather than the marginal & conditionals version. But you would want to write a specialized function that does B' Q B for a tridiagonal Q and computes the log-determinant of a tridiagonal Q. Alternatively, you can post-multiply the inverse of the Cholesky factor of that Q by a row-vector of coefficients to non-center the betas.

On Sun, Oct 14, 2018 at 2:19 PM ermeel notifications@github.com wrote:

I had tried to implement the prior distribution as proposed in Lang and Brezger's "Bayesian P-splines" paper, but got completely confused as to the appropriate Stan implementation. But am I correct in thinking that it is somewhat equivalent to a RW prior? (I've not really used RW priors before)

See the this (adapted from "Bayesian Regression modelling with INLA" by Wang et. al):

[image: ar_multivariate_normals] https://user-images.githubusercontent.com/5255975/46920373-b1ae7780-cfed-11e8-9df5-e3aeccc590ed.png

This applies for the lag-1 case. I think that one can recover Eq. 5 from the Lang and Brezger paper by taking the limit $\rho\rightarrow 1$. This would then also explain why they say

the prior (5) is improper.

We could also add a hyper prior for $\rho$ over $[-1,1]$. So from an analytic perspective I think the two approaches are equivalent, computationally I am not sure. Maybe @bgoodri https://github.com/bgoodri can help out here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-429649310, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqk9fDLJsQMDUua4SK-C-AeyERYOtks5uk4A2gaJpZM4T7CES .

ermeel86 commented 5 years ago

but I think what I need is a non-centered reparameterisation of beta ~ multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma). In theory, is that simple to specify?

@sambrilleman couldn’t we use the methods described in algorithm 2 and 3 of https://arxiv.org/abs/1801.03791 here? Especially the sparse back substitution algorithm mentioned there?

bgoodri commented 5 years ago

It isn't so complicated: If you can construct the precision matrix, you can use the inverse of its Cholesky factor in an uncentered parameterization of the coefficients. A similar thing came up in https://andrewgelman.com/2016/09/02/two-weird-tricks-for-fast-conditional-autoregressive-models-in-stan/#comment-302825

On Mon, Oct 15, 2018 at 4:30 PM ermeel notifications@github.com wrote:

but I think what I need is a non-centered reparameterisation of beta ~ multi_normal_prec(mu, smooth_sd * Tau), not beta ~ multi_normal(mu, Sigma). In theory, is that simple to specify?

@sambrilleman https://github.com/sambrilleman couldn’t we use the methods described in algorithm 2 and 3 of https://arxiv.org/abs/1801.03791 here? Especially the sparse back substitution algorithm mentioned there?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-430001868, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqu5dn4sEDxR6HRvzmUi2XhBjGaCDks5ulPBkgaJpZM4T7CES .

ermeel86 commented 5 years ago

There you say

However, this triangular solve is an O(N^3) operation, so it is slow for large N (which could easily be much larger than in the Scottish Lip Cancer example).

So if this is the case. The sparse back substitution algorithm from the above reference is asymptotically computationally more efficient, since it runs in linear time. However, i think you have to do the inverse calculation only once and a linear matrix multiplication at every leap frog step, but the above approach I refer to would do only a linear operation every leap frog step, right?

bgoodri commented 5 years ago

Yeah, you can write your own code to do the bidiagonal solve.

On Mon, Oct 15, 2018 at 4:43 PM ermeel notifications@github.com wrote:

There you say

However, this triangular solve is an O(N^3) operation, so it is slow for large N (which could easily be much larger than in the Scottish Lip Cancer example).

So if this is the case. The sparse back substitution algorithm from the above reference is asymptotically computationally more efficient, since it runs in linear time. However, i think you have to do the inverse calculation only once and a linear matrix multiplication at every leap frog step, but the above approach I refer to would do only a linear operation every leap frog step, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-430006125, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqo80TMmaJtylya4bocN4StRmxIVyks5ulPN9gaJpZM4T7CES .

sambrilleman commented 5 years ago

@ermeel86 @bgoodri Just to follow up on this. I've opened a pull request with a first working version of stan_surv that I hope is ready for release. I've just implemented the time-dependent effects with a random walk prior using the conditional specification as describe in Milad's case study. You'll find the Stan code here. It seems to be working, but I'm happy for someone to implement the fancier multivariate version (discussed above) if you want!

karlsmithbyrne commented 5 years ago

Hi All, I apologise if this is not the correct place to ask but I have been using stan_surv, and am having difficulty with hierarchical modelling. The model is a simple time to event analysis with a single predictor and three levels of a grouping parameter where each one represents a different cancer (number of levels is three).

I keep getting errors such as Error in scale(PRS_Score) | Cancer : operations are possible only for numeric, logical or complex types

when the grouping variable is a factor variable with three levels.

When I change the group variable to a numeric I get this error message: Error: Cannot deal with empty interaction levels found in columns: PRS_Score | CancerTRUE

The model is not complex:

Mod <- stan_surv(formula = Surv(y, failed) ~ PRS_Score + (PRS_Score | Cancer), data = comb, basehaz = "exp")

But it seems to work fine if I try and run it as a stan_glm ignoring the time variable:

fit4 <- stan_glmer(failed ~ PRS_Score + (PRS_Score - 1 | Cancer), data = comb, family = binomial(link = "logit"), cores = 4, iter = 500)

and the stan_surv work fine when I don't include the hierarchical parameter.

Just wondering if I am doing anything wrong here that I can work on.

Thank you all in advance!

sambrilleman commented 5 years ago

Hi Karl

I've not yet implemented hierarchical parameters for stan_surv. Sorry that the error message was so useless, I will fix that.

Hierarchical survival models will happen eventually. But I ideally wanted to get a version of stan_surv out on CRAN and roadtested a bit before adding too many features.

karlsmithbyrne commented 5 years ago

Ahhhhh ok no problem! It is already really very good! Thank you. Is there any chance you know of any workarounds I may use in the meantime? Again though, thank you!

sambrilleman commented 5 years ago

@karlsmithbyrne If you want to go the Bayesian route with a proportional hazards model, then I'm not sure there are many alternatives. I think brms currently fits exponential and Weibull accelerated failure time (AFT) models with groups-specific parameters. I see that you were trying to use an exponential baseline hazard with stan_surv in the example above. The exponential or Weibull proportional hazards model is mathematically equivalent to an exponential or Weibull AFT model, just reparameterised. This is true without group-specific parameters anyway -- I'm not sure if it holds true with group-specific parameters -- maybe someone else (@ermeel86 @statwonk) can help clarify whether that is the case. So, one option is to just use brms to fit an exponential/Weibull AFT model and if you want hazard ratios then you can obtain them using a one-to-one tranformation of the AFT coefficients.

There may be other packages I'm not aware of that fit Bayesian PH models with random effects -- you could check the CRAN task view.

In any case, with only three levels for your grouping factor Cancer, you are going to have very little information with which to estimate the variance parameter for distribution of the Cancer-specific slope parameters. Perhaps you would be better of treating it as a fixed effect and modelling the differences between cancers via an interaction term, i.e. using a model like:

Mod <- stan_surv(formula = Surv(y, failed) ~ PRS_Score + PRS_Score * Cancer, data = comb, basehaz = "exp")

Although I'm not familiar with the research question etc.

padpadpadpad commented 4 years ago

Hi guys.

Apologies if this is not the correct place to ask this, but I thought I would post about the survival models in this open Issue rather than starting a new one.

Firstly, would like to say the preprint on arXiv "Bayesian Survival Analysis Using the rstanarm R Package" is excellent. I would recommend possibly putting some code up on how to install the specific feature branch though, as I don't think its intuitive how to download that version of the GitHub rstanarm package (get the stan_surv() functions) from your preprint (devtools::install_github('stan-dev/rstanarm@feature/frailty-models')).

Secondly, I am implementing a random effects model and am trying to create population-level predictions (i.e. averaging over the random effects but having different predictions for each group-level effect).

In this vignette, it seems that I want to set dynamic = FALSE in posterior_survfit(), but when I create dummy random effect variables that are not currently used, I get an error that my factor has new levels.

When I supply levels of the random factor that are already present, the code works and I think having dynamic = FALSE still only conditions on the group-level effects and marginalises over individual level estimates.

Just checking that (a) my interpretation is correct and (b) highlighting that you cannot specific new levels in this branch atm when using posterior_survfit().

Many thanks and super impressed with the models and the vignettes and the preprint.

sambrilleman commented 4 years ago

Hey @padpadpadpad... so sorry it has taken me this long to get back to you!!

I would recommend possibly putting some code up on how to install the specific feature branch though, as I don't think its intuitive how to download that version of the GitHub rstanarm package (get the stan_surv() functions) from your preprint (devtools::install_github('stan-dev/rstanarm@feature/frailty-models')).

That is a great idea, I will aim to do that soon. However, the more up to date branch is in fact feature/survival -- it includes everything on feature/frailty-models as well as a few additional more recent commits. I'd suggest you install feature/survival (sorry for the confusion!).

In this vignette, it seems that I want to set dynamic = FALSE in posterior_survfit()

That vignette is specific to a certain class of survival models -- joint models for longitudinal and survival data. The posterior_survfit() method for stan_jm models has the dynamic argument, but there is no such argument for the posterior_survfit() method for stan_surv models. After you install the feature/survival branch, check out the function signatures under ?posterior_survfit and you should see that only the method for stanjm uses the dynamic argument (since it requires conditioning on longitudinal data).

but when I create dummy random effect variables that are not currently used, I get an error that my factor has new levels.

From memory, you should be able to provide new levels. If the newdata in posterior_survfit.stansurv() has random effect factor levels that were in the original data then it should condition on them, and if they were not in the original data then it should marginalise over them. I'm not sure why you are getting the error. Perhaps install the feature/survival branch and then try run the following example, which demonstrates conditioning on: (i) a site with a positive frailty (i.e. higher hazard, worse survival), (ii) a site with a negative frailty (i.e. lower hazard, better survival), and (iii) a new site that was not in the original data, so the predictions are marginalised over the random effects. All have trt = 0 so that random effect factor levels are the only thing that differ across the prediction covariate data:

mod <- stan_surv(
  formula = Surv(eventtime, status) ~ trt + (1 | site), 
  data    = frail[1:40,], 
  basehaz = "exp", 
  chains  = 1,
  refresh = 0,
  iter    = 600,
  seed    = 123)

# (i) prediction data for an existing site with a positive random effect
nd1 = frail[1,] 

# (ii) prediction data for an existing site with a negative random effect
nd2 = frail[12,] 

# (iii) prediction data for a new site (random effects will be marginalised over)
nd3 = nd1
nd3[['id']] = 999
nd3[['site']] = 9999

# predictions
ps1 = posterior_survfit(mod, nd1)
ps2 = posterior_survfit(mod, nd2)
ps3 = posterior_survfit(mod, nd3)

head(ps1)
head(ps2)
head(ps3)

The example should show that, as expected, ps1 has worse survival, ps2 has better survival, and ps3 is in between (essentially a random intercept = 0) but with much wider confidence interval due to the uncertainty in marginalising over the random intercepts.

If that example doesn't work for you or doesn't help clarify, then perhaps open a separate Github issue and I can try debug the issue for you!

padpadpadpad commented 4 years ago

Hey @sambrilleman thank you so much for your response.

I have installed from the survival branch now but cannot get the man pages to install (see this issue. This means I cannot look at ?posterior_survfit unfortunately.

I did manage to get the code working though so that's great. In ps3, we are predicting for a new level of the random effect and marginalising over the random effect, but is there also a way to not include the random effects in the prediction and instead use only the fixed/treatment effects (akin to re.form= NA in posterior_predict). So to get predictions over the treatment effects but not the random effects?

Many thanks Dan

sambrilleman commented 4 years ago

I did manage to get the code working though so that's great. In ps3, we are predicting for a new level of the random effect and marginalising over the random effect, but is there also a way to not include the random effects in the prediction and instead use only the fixed/treatment effects (akin to re.form= NA in posterior_predict). So to get predictions over the treatment effects but not the random effects?

hmm, sorry I don't think I implemented that! :-( I guess what you are suggesting is akin to a new group/cluster/factor level for the random effect, but you know with absolute certainty that their random effect value is 0. So I'd almost argue that such a prediction isn't a realistic acknowledgement of the uncertainty in the model...? But maybe I am wrong...

padpadpadpad commented 4 years ago

Hi Sam

These are valid points for the prediction of new individuals. I took the approach of standardising over all individuals within each treatment. But it meant I had to do a separate call of posterior_survfit() for each treatment, instead of a single call to posterior_survfit() which standardises over each individual for each covariate.

This approach seems to work for what I want. Thank you for responding and being so prompt with your responses.

sambrilleman commented 4 years ago

Sweet, glad you found a solution that seems to get you what you need!

Yeah, you can even do causal inference type things where you standardise over all the individuals (i.e. from both treatments) first assuming they have treatment fixed to 0 and then treatment fixed to 1, if that's what you want, so for e.g. something like:

mod <- stan_surv(
    formula = Surv(eventtime, status) ~ trt + (1 | site), 
    data    = frail, 
    basehaz = "exp", 
    chains  = 1,
    refresh = 0,
    iter    = 600,
    seed    = 123)

# create two prediction datasets each based on the original data
frail_0 = frail
frail_1 = frail

# fix the covariate value in the two prediction datasets
frail_0['trt'] = 0
frail_1['trt'] = 1

# predict using all ids and sites from the original data, but fixing the covariate value
ps_0 = posterior_survfit(mod, frail_0, standardise = TRUE, times = 0)
ps_1 = posterior_survfit(mod, frail_1, standardise = TRUE, times = 0)

plot(ps_0)
plot(ps_1)

but perhaps reducing frail_0 and frail_1 to one row per site (since the original data would have multiple rows per site, which would carry through as multiple rows in the prediction data, and then essentially end up like weightings in the standardised predictions I guess).

Or whatever makes sense for the type of comparisons you want to make!

ouzor commented 4 years ago

Hi all and thanks for making survival analysis with rstanarm a reality!

I encountered a small bug when working through and extending the examples in the paper. In section 6.6. we fit a simple multilevel model. Elsewhere in the paper it is mentioned that ranef() should return the random effect parameter estimates, but it does not seem to work:

mod_randint <-
  rstanarm::stan_surv(
    formula = Surv(eventtime, status) ~ trt + (1 | site),
    data    = frail,
    basehaz = "exp"
  )
rstanarm::ranef(mod_randint)
Error: This method is for stan_glmer and stan_lmer models only.

I am using linux, and rstanarm survival branch (version 2.19.1).

EDIT: After digging a bit deeper, it seems that ranef.stanreg calls .glmer_check, which in turn calls is.mer, which fails because the model object lacks the correct class, and is also missing glmod component, whatever it is.

What is btw the preferred way to post issues related to the survival functionality in rstanarm? This issue or a separate issue, or something else?

And while I'm at it, do you have any estimate on when you could be merging the feature/survival branch and getting the functionality to CRAN?

sambrilleman commented 4 years ago

Hi @ouzor. Sorry for the slow reply.

Coincidental timing actually! This same issue also recently got raised by someone else. See the issue here.

My suggestion there is just to get the random effects from my_model$stan_summary. That should also contain what you need.

You are correct about where the issue lies. Hopefully I can just add is.surv to the .glmer_check and it will resolve the issue, but I'll have to check.

For future reference -- yes, it is probably better to open a separate issue for bugs like this. It is easier to track in separate issues.

I'm still not sure when the survival branch can get pushed to CRAN. Would have to check with @bgoodri.

Cheers!

ouzor commented 4 years ago

Thanks for the reply @sambrilleman, using stan_summary is good for now!

jtelleriar commented 4 years ago

Yes, although stan_surv() it is not complete, it would be nice to have a minimal version of this feature/survival branch merged into master, so that users can start using it

jgabry commented 4 years ago

Indeed that would be nice! Unfortunately it uses too much RAM during compilation to get it on CRAN at the moment (at least I think that's the holdup). At one point it seemed like there was a plan to get the memory usage down but I'm not sure where that stands. @bgoodri ?

In the meantime, we're trying to see if we can build pre-compiled binaries that we can distribute so people can install it easily. Currently it seems like that is successful on Mac but not yet on Windows: https://github.com/stan-dev/r-packages/issues/2#issuecomment-647557697

jtelleriar commented 4 years ago

And what about compiling survival models at runtime, or first use, by using Rcpp::sourceCpp()? A install_stan_surv() function could even be considered.

It is the approach brms uses as indicated in its paper: https://cran.r-project.org/web/packages/brms/vignettes/brms_overview.pdf

image

ermeel86 commented 4 years ago

This is really an issue. I have about 20 colleagues who would like to try stan_surv (after a seminar I gave), but installing fails for most (mostly Windows). I personally wasn’t able to compile it myself on Windows neither could I prepare a binary package for Windows.

jtelleriar commented 4 years ago

Same issue occurred to me! I tried compilation on Windows, and was not able with my 8GB RAM Laptop

ermeel86 commented 4 years ago

Also, I just followed live memory usage on my computer, and it did not reach the memory limit (I have 32GB) but compilation failed because it could not allocate 40099448 bytes, which is about 4MB...

paul-buerkner commented 4 years ago

This is the size that goes over the allocatable memory. Not the totally used memory itself.

EDIT: for clarity, replace the second sentence with: This size does not represent the totally used memory itself.

Eren M. Elçi notifications@github.com schrieb am Do., 2. Juli 2020, 08:22:

Also, I just followed live memory usage on my computer, and it did not reach the memory limit (I have 32GB) but compilation failed because it could not allocate 40099448 bytes, which is about 4MB...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-652808725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AGCWEQB5SKPPRHTO23RZQRUFANCNFSM4E7MEEJA .

ermeel86 commented 4 years ago

So this means it is not really a memory problem per se, i.e. people having not enough memory, but another compile / C++ issue? Is it clear what the underlying mechanism is?

paul-buerkner commented 4 years ago

To clarify my above post: When your R can allocate, say, 16 GB of your memory and your end up not being able to allocate 4MB as per the above message, this means that R tried to allocate 16 GB + 4 MB in total at which point it failed.

sambrilleman commented 4 years ago

@ermeel86 Yeah, it is quite a disaster. I personally cannot get the build working either. I have tried both R 4.0.2 and R 3.6.2 (which was the last version I was using when I was actively working on this stuff), but have had no success with either.

One additional problem is that I haven't been actively working on this branch for 6 months or more, so I'm not even sure at which point the build started to break (e.g. R 4.x, or a Windows update, or something else).

I know very little about these types of compiler issues, or even how the compilation infrastructure for rstanarm works in detail. So I am as stuck as most of the users.

One thought I had was attempting to update the feature/survival branch (or a branch off of it) with the latest changes from the master branch to see if that helped. But I even have trouble installing the master development branch of rstanarm at the moment. 😞

jgabry commented 4 years ago

@bgoodri mentioned at the Stan meeting today that adding "--no-multiarch" might help on Windows. That can go in the build_opts argument to devtools::install_github() or the args argument to devtools::build() (if you're building locally). Can someone with a Windows machine try that out? (I don't have one).

It was also discussed in the Stan meeting today that we can probably fix this by using standalone generated quantities for all of our Stan programs in rstanarm. @bgoodri said he will start working on this soon.

ermeel86 commented 4 years ago

To clarify my above post: When your R can allocate, say, 16 GB of your memory and your end up not being able to allocate 4MB as per the above message, this means that R tried to allocate 16 GB + 4 MB in total at which point it failed.

Is there a way to increase the memory for R for this purpose? I am asking because I checked the overall system memory consumption on my computer during compilation and there were about 18 GB left (unless this was a super fast jump which wasn’t visible graphically).

ermeel86 commented 4 years ago

@bgoodri mentioned at the Stan meeting today that adding "--no-multiarch" might help on Windows. That can go in the build_opts argument to devtools::install_github() or the args argument to devtools::build() (if you're building locally). Can someone with a Windows machine try that out? (I don't have one).

It was also discussed in the Stan meeting today that we can probably fix this by using standalone generated quantities for all of our Stan programs in rstanarm. @bgoodri said he will start working on this soon.

Thanks @jgabry this is really appreciated. I just tried your compile flag suggestion, and it printed a warning that the "--no-multiarch" is unknown. I am on Windows 10, with R 4.0.0. another question, is it normal behaviour that it compiles under i386 with a 32bit compiler, even when on a 64 bit system?

paul-buerkner commented 4 years ago

The memory.limit() function can be used to control the amount of memory available to R.

Am Fr., 3. Juli 2020 um 06:55 Uhr schrieb Eren M. Elçi < notifications@github.com>:

@bgoodri https://github.com/bgoodri mentioned at the Stan meeting today that adding "--no-multiarch" might help on Windows. That can go in the build_opts argument to devtools::install_github() or the args argument to devtools::build() (if you're building locally). Can someone with a Windows machine try that out? (I don't have one).

It was also discussed in the Stan meeting today that we can probably fix this by using standalone generated quantities for all of our Stan programs in rstanarm. @bgoodri https://github.com/bgoodri said he will start working on this soon.

Thanks @jgabry https://github.com/jgabry this is really appreciated. I will try your compile flag suggestion.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/280#issuecomment-653346554, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ACPKYEZHZYMMCCMZYDRZVQCVANCNFSM4E7MEEJA .

ermeel86 commented 4 years ago

The memory.limit() function can be used to control the amount of memory available to R. Am Fr., 3. Juli 2020 um 06:55 Uhr schrieb Eren M. Elçi < notifications@github.com>:

If I interpret the output of that function on my machine correctly, then it is approximately 32GB. Now I don’t understand how your previous comment can explain the memory error: R failed to get the additional 4MB, when I had at least 16 GB left during the test and my machine has in total 32 GB, which roughly coincides with the limit...