Open stla opened 4 years ago
Hi @stla, I think the issue here is that the concentration parameter [edit: see below, I confused concentration and regularization] is relevant for the prior on the correlation matrix between the varying intercepts and slopes but your model only has varying intercepts and doesn't have any varying slopes. Here's an example with both varying intercepts and slopes where you can see the effect of the concentration parameter:
# formula with both varying intercepts and slopes
fit <- stan_lmer(mpg ~ (1 + wt|cyl), data = mtcars, prior_covariance = decov(concentration = 10000))
Here is the section of the print output with the hierarchical standard deviations:
Error terms:
Groups Name Std.Dev. Corr
cyl (Intercept) 4.7
wt 4.7 -0.16
Residual 2.7
Num. levels: cyl 3
Hi @jgabry
Thank you. So perhaps I misunderstood the doc or it is not clear. According to my understanding, concentration
is a parameter of the Dirichlet prior distribution on the proportions of the groups-level variances. Then this is what I thought:
data {
int<lower=1> N;
real y[N];
int<lower=1> I;
int<lower=1> J;
int<lower=1> PartID[N];
int<lower=1> OperatorID[N];
int<lower=1> InteractionID[N];
real<lower=0> shape;
real<lower=0> scale;
real<lower=0> concentration;
}
parameters {
real<lower=0> tau;
simplex[2] theta;
vector[I] PartA;
vector[J] Op;
vector[I*J] OpPart;
real<lower=0> sigmaE;
}
transformed parameters {
real<lower=0> sigma_between_total;
vector<lower=0>[2] sigmas_between;
real<lower=0> sigmaO;
real<lower=0> sigmaOP;
sigma_between_total = sqrt(2) * tau;
sigmas_between = sigma_between_total * sqrt(theta);
sigmaO = sigmas_between[1];
sigmaOP = sigmas_between[2];
}
model {
tau ~ gamma(shape, scale);
theta ~ dirichlet(rep_vector(concentration, 2));
Op ~ normal(0, sigmaO);
OpPart ~ normal(0, sigmaOP);
sigmaE ~ cauchy(0, 5);
PartA ~ normal(0, 100);
for(k in 1:N){
y[k] ~ normal(
PartA[PartID[k]] +
Op[OperatorID[k]] +
OpPart[InteractionID[k]],
sigmaE
);
}
}
generated quantities {
real<lower=0> sigmaTotal;
sigmaTotal = sqrt(sigmaE^2 + sigmaO^2 + sigmaOP^2);
}
Is it wrong? If this is wrong, how could we control the prior distribution on the groups-level variances?
Oops, I was thinking about the regularization
parameter (which is for the LKJ prior on the correlation matrix) and not the concentration
parameter, sorry! You're right that concentration
is for the dirichlet. Let me take a closer look at this and get back to you.
how could we control the prior distribution on the groups-level variances
The shape
and scale
arguments to decov()
will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri Is that not the case?
For terms like (1 | g), then the variance-covariance matrix is 1x1 and the Dirichlet prior on the variance proportion would put all of its mass on 1. At that point, the gamma prior with the given shape and scale pertains to the standard deviation of the intercept across levels of g.
On Tue, Nov 10, 2020 at 2:08 PM Jonah Gabry notifications@github.com wrote:
how could we control the prior distribution on the groups-level variances
The shape and scale arguments to decov() will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri https://github.com/bgoodri Is that not the case?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724906278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKXPZY2HEG56XKNG4ALSPGFUFANCNFSM4TN5R7VA .
For terms like (1 | g), then the variance-covariance matrix is 1x1
Ok right, thanks. So I was on the right track in my initial comment about the model only containing varying intercepts and not varying slopes, but then I confused myself!
Sorry I don't understand. Here there are two (1|g)
terms. The doc says that the Gamma prior is assigned on the sum of the two corresponding variances (up to a square root somewhere), and that the two proportions of these variances a priori follow the Dirichlet distribution. What am I missing?
And then, how to set the prior distribution on these two variances, if I'm wrong?
@bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.
If you have multiple lme4-style statements, like (1 | a) and (1 | b), then the two variance-covariance matrices are each 1x1 and independent of each other before seeing the data. If instead you had something like (1 + x | c), then the single variance-covariance matrix is 2x2 and then the prior on the correlation matrix and the two variance proportions comes into play, although they are jointly uniform by default.
On Tue, Nov 10, 2020 at 4:41 PM Jonah Gabry notifications@github.com wrote:
@bgoodri https://github.com/bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724983463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZH2KUIKZU4IE5VALSPGXP5ANCNFSM4TN5R7VA .
So, if there are two (1|g)
terms, rstanarm
assigns the same Gamma prior distribution on the corresponding standard deviations?
You might be able to pass 2 shapes and / or scales, but the gamma distribution is used for both.
On Thu, Nov 12, 2020, 7:21 AM stla notifications@github.com wrote:
So, if there are two (1|g) terms, rstanarm assigns the same Gamma prior distribution on the corresponding standard deviations?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726044561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKX4AHFKSP6DKQSMQZTSPPHL5ANCNFSM4TN5R7VA .
I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1))
but I didn't obtain expected results. How should one specify multiple Gamma priors?
That was supposed to work.
On Thu, Nov 12, 2020 at 9:59 AM stla notifications@github.com wrote:
I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1)) but I didn't obtain expected results. How should one specify multiple Gamma priors?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726130695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZMJNTTOFCUVW2EE3SPPZ6FANCNFSM4TN5R7VA .
Yes, sorry, this works actually.
Both the R documentation and the online documentation do not mention this possibility.
And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).
I would guess that it is in the same order that they appear in the formula, but I wouldn't be too surprised if lme4 changes the order.
On Thu, Nov 12, 2020 at 12:26 PM stla notifications@github.com wrote:
And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726221845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQF5NY5IBACSY66SUTSPQLDJANCNFSM4TN5R7VA .
it is in the same order that they appear in the formula
No :)
Hello,
Here is a model with two between variances:
From my understanding, if the
concentration
parameter is high (here 1e6), the two between variances should be equal. But this is not the case:And I don't see any significant difference when I set it to
1
.To reproduce the data: