Closed muthca closed 8 years ago
From my interpretation: 'prior =' specifies the distribution of all random slope parameters 'prior_intercept =' specifies the distribution of all random intercept parameters 'prior_ops =' specifies the cauchy shape variable for all variance parameters 'prior_covariance = decov(shape=)' specifies the lkjcorr shape parameter for the covariance matrix.
This is not quite right, but it's definitely confusing so hopefully I can help clarify:
prior
specifies the distribution(s) for all slope parameters (coefficients) that do not vary by groupprior_intecept
specifies the distribution for the model's global intercept.prior_covariance
actually specifies the entire structure for all parameters that vary by group because they have zero mean (by design, i.e. they are shifts from the global intercept parameter or the common slope parameter) and so all that needs to be specified is the prior for the covariance matrix (or matrices). If you want to change the parameter for the lkj
prior on the underlying correlation matrix then you actually want to specify the regularization
parameter for the decov
function and not the shape
parameter. The relevant documentation is at help("priors", package = "rstanarm")
where it says in the Covariance Matrices sectionThe prior for a correlation matrix is called LKJ whose density is proportional to the determinant of the correlation matrix raised to the power of a positive
regularization
parameter minus one. Ifregularization = 1
(the default), then this prior is jointly uniform over all correlation matrices of that size. Ifregularization > 1
, then the identity matrix is the mode and in the unlikely case thatregularization < 1
, the identity matrix is the trough.
So, to summarize, you can basically think of the prior
and prior_intercept
arguments as doing the same thing for stan_glmer
as they do for stan_glm
, i.e. control the priors for the parameters that don't vary by group. The prior_covariance
function specifies everything about the group-specific parameters. This is a bit different than what the rethinking package does, and it's a bit more restrictive (intentionally), but we've found that doing it this way works very well in general.
Hope that helps, but definitely follow up if you have further questions. This stuff can definitely be pretty subtle!
Thanks so much for your feedback - it's a huge help! Unless I've misunderstood, I believe I've specified everything as I had with the rethinking model, except for the prior_covariance...
However, I'm still unsure if I've interpreted the prior_ops correctly. Does prior_ops indeed control the priors of all variance parameters? If I would like to specify a half cauchy prior on all variables (dcauchy(0,2) to be specific), would this be correct: prior_ops = prior_options(prior_scale_for_dispersion = 2),
Thank you again~
Glad that helped! Regarding the variances, which variance parameters are you referring to, the error variance (i.e. sigma^2 for a linear model) or the variance parameters on the diagonal of the covariance matrix for the varying parameters (aka "random effects", although we try not to use that term)?
I'm referring to both variance parameters, actually. I see that prior_ops refers to the standard error (and so assume that prior_ops = prior_options(prior_scale_for_dispersion = 2), corresponds to dcauchy(0,2), correct?), but was unable to find where to specify priors for the diagonal of the covariance matrix.
Right, prior_scale_for_dispersion
in prior_ops
controls the scale of a Cauchy prior on so-called dispersion parameters. This includes the error standard deviation (sigma) for linear models, the over-dispersion parameter for negative binomial models, the shape parameter for Gamma regression models, etc. In the not so distant future we'll be abandoning prior_ops
and replacing it with a much clearer and more intuitive way of specifying its content.
The diagonal of the covariance matrix is implied by the decov
prior. This is described to a certain extent in help("priors", package = "rstanarm")
and in more detail in the vignette for stan_glmer
. In that vignette take a look at the Details subsection in the Priors on covariance matrices section. It might be a bit unfamiliar and it's a bit confusing at first, but once you understand it it's really cool! I'll give a brief summary here and then let me know if you still have questions after also checking out the vignette (totally fine if you do). Essentially, instead of estimating the variances directly we instead estimate a scalar parameter representing the total variance and a simplex vector where each element represents a proportion of the total variance. That is, for a J
by J
covariance matrix Sigma
for a parameter vector theta
of length J
we set
diag(Sigma) = J * tau^2 * pi
where J * tau^2
is the total variance (equal to the trace of the matrix Sigma), and pi
is the simplex vector, where pi[j]
is the proportion of the total variance attributable to theta[j]
. The concentration
argument of the decov
function controls the parameter of a Dirichlet prior on pi
, and the shape
and scale
arguments are used to control the parameters of a Gamma prior on tau
.
Jonah
Thanks again - I’ve been studying the help and vignette pages. Sorry to say I have more questions - I’ve numbered them to help keep track:
1) Please let me know if I’m off base here: the group specific variances (diagonal of covariance matrix) are all contained in the simplex vector, and it is this vector which we specify one prior over — the Dirchlet prior.
2) If that’s correct, it would be impossible to set individual cauchy priors on these variances, no? Can you suggest how to specify a Dirchlet prior in accordance with a regularized half cauchy with scale=2? From the help page, it looks I’d use either the decov() or dirchelet() functions, but I’m not sure which… would it be something like:
decov(regularization = 1, concentration = 2, shape = 1, scale = 1) or dirichlet(concentration = 2)
A couple other things: 3) I’m using the get_stancode(be_mod1$stanfit) to extract stan output… Is there a specification to format the output? My output is a block of unformatted code.
4) Can you tell me when you expect to update the prior_ops function? I ask because I’m writing up a tutorial on Bayesian hierarchical modeling, and would like to feature rstanarm.
Really appreciate your help!
Hi Chelsea, some comments below:
On Sat, Apr 2, 2016 at 9:36 AM, Chelsea Muth notifications@github.com wrote:
Sorry to say I have more questions
No problem at all!
1) Please let me know if I’m off base here: the group specific variances (diagonal of covariance matrix) are all contained in the simplex vector, and it is this vector which we specify one prior over — the Dirchlet prior.
They're contained in the vector J * tau^2 * pi . Here J is known, tau is a scalar parameter given a gamma prior (it's hyperparameters are decov's shape and scale parameters) and the product J * tau^2 represents the total variance (he sum of each of the individual variances). And pi is the simplex vector whose elements represent the proportion of that total variance attributable to each of the variables (the variables whose covariance matrix we're talking about). pi gets that dirichlet prior. To control it's concentration parameter you use the concentration argument to decov(). Essentially the arguments to decov are the hyperparameters needed for the prior distributions on each of the parameters we use to construct the covariance matrix. The regularization argument is for the eta parameter of the lkj prior on the correlation matrix, concentration is for the dirichlet prior on the simplex vector pi, and shape and scale are for the gamma prior on tau. So the prior on the covariance matrix is composed from these priors on the different elements of its decomposition.
2) If that’s correct, it would be impossible to set individual cauchy priors on these variances, no? Can you suggest how to specify a Dirchlet prior in accordance with a regularized half cauchy with scale=2? From the help page, it looks I’d use either the decov() or dirchelet() functions, but I’m not sure which… would it be something like:
In this case, like I mentioned above, you just use the concentration argument to decov(). The dirichlet prior is handled internally so you don't need to explicitly call dirichlet().
You're right that it's not possible to get exactly half-Cauchy(0,2) priors on the diagonal elements of the covariance matrix. But all the evidence I've seen favors the decov prior. The half-Cauchy is fine (although you probably want it for the standard deviations and not the variances) but we've found the way the decov prior decomposes the covariance matrix is quite robust (you rarely need to tweak the defaults to get really good results). @bgoodri, is it possible to mess with the decov arguments and get close to what @muthca is looking for?
3) I’m using the get_stancode(be_mod1$stanfit) to extract stan output… Is
there a specification to format the output? My output is a block of unformatted code.
I think you can just put the call to get_stancode inside cat() and it should format the text properly.
4) Can you tell me when you expect to update the prior_ops function? I ask
because I’m writing up a tutorial on Bayesian hierarchical modeling, and would like to feature rstanarm.
Cool! We'll replace it it in the next release or the release after that, which should hopefully be within the month of April. But I don't think prior_ops is particularly important, and probably not worth mentioning in a tutorial unless absolutely necessary for some reason. @bgoodri do you agree?
Really appreciate your help!
Happy to help. Looking forward to seeing the tutorial you're working on. Let us know if you want us to take a look at some point.
Jonah
It is correct that it is not possible to put a half-Cauchy prior on the variances or standard deviations in models with group-specific parameters. That is unfortunate when trying to replicate something in the rethinking package, but it is a net positive when trying to model data. You may be able to get something resembling a half-Cauchy prior by making the scale
parameter in decov
very large.
The half-Cauchy prior has long tails that sometimes work well and sometimes lead to sampling problems (depending on the data). The model fitting functions in the rstanarm package, such as stan_glmer
, are intended to work well for almost all data that are generated by the model being estimated.
Also I agree with @jgabry that the less said about prior_ops
, the better. And if there is something in your tutorial that you think is not adequately addressed by the rstanarm vignettes, consider making a pull request to improve the vignettes.
Thanks so much @jgabry and @bgoodri. I’m grateful for the thorough and timely feedback. Looking forward to pulling this tutorial together and I really appreciate your offer to take a look at it.
Enjoy your weekend, Chelsea
I'm looking for some clarification on prior specification with rstanarm, since I’m unsure about my interpretation of the documentation. In the code below, I am trying to recreate a cubic multilevel model in rstanarm based on an original model that I specified with the rethinking() package, which requires individual specification of the priors on each parameter.
Below, have I specified the rstanarm priors to match the rethinking priors? Meaning, I would like NORMAL(0,10) priors on all intercepts and slopes, CAUCHY(0,2) priors on all variance parameters, and a LKJCORR(2) prior on the covariance matrix.
From my interpretation: 'prior =' specifies the distribution of all random slope parameters 'prior_intercept =' specifies the distribution of all random intercept parameters 'prior_ops =' specifies the cauchy shape variable for all variance parameters 'prior_covariance = decov(shape=)' specifies the lkjcorr shape parameter for the covariance matrix.
Is this correct? Is there a different approach to go in and specify priors for each parameter individually?
Thanks very much, Chelsea
CODE FOR COMPARING PRIOR SPECIFICATION IN RSTANARM AND RETHINKING MODELS
library(rethinking) library(rstan) library(rstanarm)
1. Cubic unconditional - rstanarm
be_mod3 <- try(stan_lmer(cort ~ (1 + time_c + I(time2) + I(time3)) + ((1 + time_c + I(time2) + I(time3)) | id), data=dccE, REML=FALSE, prior = normal(0,10), prior_intercept = normal(0,10), prior_ops = prior_options(prior_scale_for_dispersion = 2), prior_covariance = decov(shape=2) ))
2. Cubic unconditional - rethinking
mcort_c <- map2stan( alist( cort ~ dnorm( mu , sigma ), mu <- a_ind[id] + (b_ind[id])_time_c