muthca commented 8 years ago

I'm looking for some clarification on prior specification with rstanarm, since I’m unsure about my interpretation of the documentation. In the code below, I am trying to recreate a cubic multilevel model in rstanarm based on an original model that I specified with the rethinking() package, which requires individual specification of the priors on each parameter.

Below, have I specified the rstanarm priors to match the rethinking priors? Meaning, I would like NORMAL(0,10) priors on all intercepts and slopes, CAUCHY(0,2) priors on all variance parameters, and a LKJCORR(2) prior on the covariance matrix.

From my interpretation: 'prior =' specifies the distribution of all random slope parameters 'prior_intercept =' specifies the distribution of all random intercept parameters 'prior_ops =' specifies the cauchy shape variable for all variance parameters 'prior_covariance = decov(shape=)' specifies the lkjcorr shape parameter for the covariance matrix.

Is this correct? Is there a different approach to go in and specify priors for each parameter individually?

Thanks very much, Chelsea

CODE FOR COMPARING PRIOR SPECIFICATION IN RSTANARM AND RETHINKING MODELS

library(rethinking) library(rstan) library(rstanarm)

1. Cubic unconditional - rstanarm

be_mod3 <- try(stan_lmer(cort ~ (1 + time_c + I(time2) + I(time3)) + ((1 + time_c + I(time2) + I(time3)) | id), data=dccE, REML=FALSE, prior = normal(0,10), prior_intercept = normal(0,10), prior_ops = prior_options(prior_scale_for_dispersion = 2), prior_covariance = decov(shape=2) ))

2. Cubic unconditional - rethinking

mcort_c <- map2stan( alist( cort ~ dnorm( mu , sigma ), mu <- a_ind[id] + (b_ind[id])_time_c

(b_ind2[id])_time2
(b_ind3[id])*time3, c(a_ind, b_ind, b_ind2, b_ind3)[id] ~ dmvnorm2(c(a,b,c,d), sigma_ind, Rho), a ~ dnorm(0,10), b ~ dnorm(0,10), c ~ dnorm(0,10), d ~ dnorm(0,10), sigma_ind ~ dcauchy(0,2), sigma ~ dcauchy(0,2), Rho ~ dlkjcorr(2) ) , data=dccE , iter=10000 , warmup=2000 , chains=4 ) fitc = precis(mcort_c, depth = 2)

jgabry commented 8 years ago

From my interpretation: 'prior =' specifies the distribution of all random slope parameters 'prior_intercept =' specifies the distribution of all random intercept parameters 'prior_ops =' specifies the cauchy shape variable for all variance parameters 'prior_covariance = decov(shape=)' specifies the lkjcorr shape parameter for the covariance matrix.

This is not quite right, but it's definitely confusing so hopefully I can help clarify:

prior specifies the distribution(s) for all slope parameters (coefficients) that do not vary by group
prior_intecept specifies the distribution for the model's global intercept.
prior_covariance actually specifies the entire structure for all parameters that vary by group because they have zero mean (by design, i.e. they are shifts from the global intercept parameter or the common slope parameter) and so all that needs to be specified is the prior for the covariance matrix (or matrices). If you want to change the parameter for the lkj prior on the underlying correlation matrix then you actually want to specify the regularization parameter for the decov function and not the shape parameter. The relevant documentation is at help("priors", package = "rstanarm") where it says in the Covariance Matrices section

The prior for a correlation matrix is called LKJ whose density is proportional to the determinant of the correlation matrix raised to the power of a positive regularization parameter minus one. If regularization = 1 (the default), then this prior is jointly uniform over all correlation matrices of that size. If regularization > 1, then the identity matrix is the mode and in the unlikely case that regularization < 1, the identity matrix is the trough.

So, to summarize, you can basically think of the prior and prior_intercept arguments as doing the same thing for stan_glmer as they do for stan_glm, i.e. control the priors for the parameters that don't vary by group. The prior_covariance function specifies everything about the group-specific parameters. This is a bit different than what the rethinking package does, and it's a bit more restrictive (intentionally), but we've found that doing it this way works very well in general.

Hope that helps, but definitely follow up if you have further questions. This stuff can definitely be pretty subtle!

muthca commented 8 years ago

Thanks so much for your feedback - it's a huge help! Unless I've misunderstood, I believe I've specified everything as I had with the rethinking model, except for the prior_covariance...

However, I'm still unsure if I've interpreted the prior_ops correctly. Does prior_ops indeed control the priors of all variance parameters? If I would like to specify a half cauchy prior on all variables (dcauchy(0,2) to be specific), would this be correct: prior_ops = prior_options(prior_scale_for_dispersion = 2),

Thank you again~

jgabry commented 8 years ago

Glad that helped! Regarding the variances, which variance parameters are you referring to, the error variance (i.e. sigma^2 for a linear model) or the variance parameters on the diagonal of the covariance matrix for the varying parameters (aka "random effects", although we try not to use that term)?

muthca commented 8 years ago

I'm referring to both variance parameters, actually. I see that prior_ops refers to the standard error (and so assume that prior_ops = prior_options(prior_scale_for_dispersion = 2), corresponds to dcauchy(0,2), correct?), but was unable to find where to specify priors for the diagonal of the covariance matrix.

jgabry commented 8 years ago

Right, prior_scale_for_dispersion in prior_ops controls the scale of a Cauchy prior on so-called dispersion parameters. This includes the error standard deviation (sigma) for linear models, the over-dispersion parameter for negative binomial models, the shape parameter for Gamma regression models, etc. In the not so distant future we'll be abandoning prior_ops and replacing it with a much clearer and more intuitive way of specifying its content.

The diagonal of the covariance matrix is implied by the decov prior. This is described to a certain extent in help("priors", package = "rstanarm") and in more detail in the vignette for stan_glmer. In that vignette take a look at the Details subsection in the Priors on covariance matrices section. It might be a bit unfamiliar and it's a bit confusing at first, but once you understand it it's really cool! I'll give a brief summary here and then let me know if you still have questions after also checking out the vignette (totally fine if you do). Essentially, instead of estimating the variances directly we instead estimate a scalar parameter representing the total variance and a simplex vector where each element represents a proportion of the total variance. That is, for a J by J covariance matrix Sigma for a parameter vector theta of length J we set

diag(Sigma) = J * tau^2 * pi

where J * tau^2 is the total variance (equal to the trace of the matrix Sigma), and pi is the simplex vector, where pi[j] is the proportion of the total variance attributable to theta[j] . The concentration argument of the decov function controls the parameter of a Dirichlet prior on pi, and the shape and scale arguments are used to control the parameters of a Gamma prior on tau.

Jonah

muthca commented 8 years ago

Thanks again - I’ve been studying the help and vignette pages. Sorry to say I have more questions - I’ve numbered them to help keep track:

1) Please let me know if I’m off base here: the group specific variances (diagonal of covariance matrix) are all contained in the simplex vector, and it is this vector which we specify one prior over — the Dirchlet prior.

2) If that’s correct, it would be impossible to set individual cauchy priors on these variances, no? Can you suggest how to specify a Dirchlet prior in accordance with a regularized half cauchy with scale=2? From the help page, it looks I’d use either the decov() or dirchelet() functions, but I’m not sure which… would it be something like:

decov(regularization = 1, concentration = 2, shape = 1, scale = 1) or dirichlet(concentration = 2)

A couple other things: 3) I’m using the get_stancode(be_mod1$stanfit) to extract stan output… Is there a specification to format the output? My output is a block of unformatted code.

4) Can you tell me when you expect to update the prior_ops function? I ask because I’m writing up a tutorial on Bayesian hierarchical modeling, and would like to feature rstanarm.

Really appreciate your help!

jgabry commented 8 years ago

Hi Chelsea, some comments below:

On Sat, Apr 2, 2016 at 9:36 AM, Chelsea Muth notifications@github.com wrote:

Sorry to say I have more questions

No problem at all!

1) Please let me know if I’m off base here: the group specific variances (diagonal of covariance matrix) are all contained in the simplex vector, and it is this vector which we specify one prior over — the Dirchlet prior.

They're contained in the vector J * tau^2 * pi . Here J is known, tau is a scalar parameter given a gamma prior (it's hyperparameters are decov's shape and scale parameters) and the product J * tau^2 represents the total variance (he sum of each of the individual variances). And pi is the simplex vector whose elements represent the proportion of that total variance attributable to each of the variables (the variables whose covariance matrix we're talking about). pi gets that dirichlet prior. To control it's concentration parameter you use the concentration argument to decov(). Essentially the arguments to decov are the hyperparameters needed for the prior distributions on each of the parameters we use to construct the covariance matrix. The regularization argument is for the eta parameter of the lkj prior on the correlation matrix, concentration is for the dirichlet prior on the simplex vector pi, and shape and scale are for the gamma prior on tau. So the prior on the covariance matrix is composed from these priors on the different elements of its decomposition.

2) If that’s correct, it would be impossible to set individual cauchy priors on these variances, no? Can you suggest how to specify a Dirchlet prior in accordance with a regularized half cauchy with scale=2? From the help page, it looks I’d use either the decov() or dirchelet() functions, but I’m not sure which… would it be something like:

In this case, like I mentioned above, you just use the concentration argument to decov(). The dirichlet prior is handled internally so you don't need to explicitly call dirichlet().

You're right that it's not possible to get exactly half-Cauchy(0,2) priors on the diagonal elements of the covariance matrix. But all the evidence I've seen favors the decov prior. The half-Cauchy is fine (although you probably want it for the standard deviations and not the variances) but we've found the way the decov prior decomposes the covariance matrix is quite robust (you rarely need to tweak the defaults to get really good results). @bgoodri, is it possible to mess with the decov arguments and get close to what @muthca is looking for?

3) I’m using the get_stancode(be_mod1$stanfit) to extract stan output… Is

there a specification to format the output? My output is a block of unformatted code.

I think you can just put the call to get_stancode inside cat() and it should format the text properly.

4) Can you tell me when you expect to update the prior_ops function? I ask

because I’m writing up a tutorial on Bayesian hierarchical modeling, and would like to feature rstanarm.

Cool! We'll replace it it in the next release or the release after that, which should hopefully be within the month of April. But I don't think prior_ops is particularly important, and probably not worth mentioning in a tutorial unless absolutely necessary for some reason. @bgoodri do you agree?

Really appreciate your help!

Happy to help. Looking forward to seeing the tutorial you're working on. Let us know if you want us to take a look at some point.

Jonah

bgoodri commented 8 years ago

It is correct that it is not possible to put a half-Cauchy prior on the variances or standard deviations in models with group-specific parameters. That is unfortunate when trying to replicate something in the rethinking package, but it is a net positive when trying to model data. You may be able to get something resembling a half-Cauchy prior by making the scale parameter in decov very large.

The half-Cauchy prior has long tails that sometimes work well and sometimes lead to sampling problems (depending on the data). The model fitting functions in the rstanarm package, such as stan_glmer, are intended to work well for almost all data that are generated by the model being estimated.

Also I agree with @jgabry that the less said about prior_ops, the better. And if there is something in your tutorial that you think is not adequately addressed by the rstanarm vignettes, consider making a pull request to improve the vignettes.

muthca commented 8 years ago

Thanks so much @jgabry and @bgoodri. I’m grateful for the thorough and timely feedback. Looking forward to pulling this tutorial together and I really appreciate your offer to take a look at it.

Enjoy your weekend, Chelsea

stan-dev / rstanarm

Prior specification question #88

CODE FOR COMPARING PRIOR SPECIFICATION IN RSTANARM AND RETHINKING MODELS

1. Cubic unconditional - rstanarm

2. Cubic unconditional - rethinking