stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
388 stars 133 forks source link

The `concentration` parameter in `decov` has no effect. #483

Open stla opened 4 years ago

stla commented 4 years ago

Hello,

Here is a model with two between variances:

library(rstanarm)
rstanarm <- stan_lmer(
  y ~  0 + Part + (1|Operator) + (1|Operator:Part), data = dat,
  prior = normal(0, 100),
  prior_aux = cauchy(0, 5),
  prior_covariance = decov(concentration = 1000000, shape = 1, scale = 1/1),
  iter = 10000
)
posterior_interval(rstanarm, prob = 95/100)

From my understanding, if the concentration parameter is high (here 1e6), the two between variances should be equal. But this is not the case:

                                                      2.5%      97.5%
PartA1                                        7.6725149693 10.9984804
PartA2                                       17.7830860979 21.1723343
...
sigma                                         1.5266337590  2.3625997
Sigma[Operator:Part:(Intercept),(Intercept)]  0.0004819209  3.6359430
Sigma[Operator:(Intercept),(Intercept)]       0.0251682030 10.6149320

And I don't see any significant difference when I set it to 1.

To reproduce the data:

SimAV2mixed <- function(I, J, Kij, mu=0, alphai, sigmaO=1,
                        sigmaPO=1, sigmaE=1, factor.names=c("Part","Operator"),
                        resp.name="y", keep.intermediate=FALSE){
  Operator <- rep(1:J, each=I)
  Oj <- rep(rnorm(J, 0, sigmaO), each=I)
  Part <- rep(1:I, times=J)
  Pi <- rep(alphai, times=J)
  POij <- rnorm(I*J, 0, sigmaPO)
  simdata0 <- data.frame(Part, Operator, Pi, Oj, POij)
  simdata0$Operator <- factor(simdata0$Operator)
  levels(simdata0$Operator) <- 
    sprintf(paste0("%0", floor(log10(J))+1, "d"), 1:J)
  simdata0$Part <- factor(simdata0$Part)
  levels(simdata0$Part) <- sprintf(paste0("%0", floor(log10(I))+1, "d"), 1:I)
  simdata <- 
    as.data.frame(
      sapply(simdata0, function(v) rep(v, times=Kij), simplify=FALSE))
  Eijk <- rnorm(sum(Kij), 0, sigmaE)
  simdata <- cbind(simdata, Eijk)
  simdata[[resp.name]] <- mu + with(simdata, Oj+Pi+POij+Eijk)
  levels(simdata[,1]) <- paste0("A", levels(simdata[,1]))
  levels(simdata[,2]) <- paste0("B", levels(simdata[,2]))
  names(simdata)[1:2] <- factor.names
  if(!keep.intermediate) simdata <- simdata[,c(factor.names,resp.name)]
  simdata
}

set.seed(666)  
I = 2; J = 6; Kij = rpois(I*J, 1) + 3
alphai <- c(10, 20)
sigmaO <- 1
sigmaPO <- 0.5
sigmaE <- 2
dat <- SimAV2mixed(I, J, Kij, mu = 0, alphai = alphai, 
                   sigmaO = sigmaO, sigmaPO = sigmaPO, sigmaE = sigmaE)
jgabry commented 4 years ago

Hi @stla, I think the issue here is that the concentration parameter [edit: see below, I confused concentration and regularization] is relevant for the prior on the correlation matrix between the varying intercepts and slopes but your model only has varying intercepts and doesn't have any varying slopes. Here's an example with both varying intercepts and slopes where you can see the effect of the concentration parameter:

# formula with both varying intercepts and slopes
fit <- stan_lmer(mpg ~ (1 + wt|cyl), data = mtcars, prior_covariance = decov(concentration = 10000))

Here is the section of the print output with the hierarchical standard deviations:

Error terms:
 Groups   Name        Std.Dev. Corr 
 cyl      (Intercept) 4.7           
          wt          4.7      -0.16
 Residual             2.7           
Num. levels: cyl 3 
stla commented 4 years ago

Hi @jgabry

Thank you. So perhaps I misunderstood the doc or it is not clear. According to my understanding, concentration is a parameter of the Dirichlet prior distribution on the proportions of the groups-level variances. Then this is what I thought:

data {
  int<lower=1> N;
  real y[N];
  int<lower=1> I;
  int<lower=1> J;
  int<lower=1> PartID[N];
  int<lower=1> OperatorID[N];
  int<lower=1> InteractionID[N];
  real<lower=0> shape;
  real<lower=0> scale;
  real<lower=0> concentration;
}

parameters {
  real<lower=0> tau;
  simplex[2] theta;
  vector[I] PartA;
  vector[J] Op;
  vector[I*J] OpPart;
  real<lower=0> sigmaE;
}

transformed parameters {
  real<lower=0> sigma_between_total;
  vector<lower=0>[2] sigmas_between;
  real<lower=0> sigmaO;
  real<lower=0> sigmaOP;
  sigma_between_total = sqrt(2) * tau; 
  sigmas_between = sigma_between_total * sqrt(theta);
  sigmaO = sigmas_between[1];
  sigmaOP = sigmas_between[2];
}

model {
  tau ~ gamma(shape, scale);
  theta ~ dirichlet(rep_vector(concentration, 2));
  Op ~ normal(0, sigmaO);
  OpPart ~ normal(0, sigmaOP);
  sigmaE ~ cauchy(0, 5);
  PartA ~ normal(0, 100); 
  for(k in 1:N){
    y[k] ~ normal(
      PartA[PartID[k]] + 
        Op[OperatorID[k]] + 
          OpPart[InteractionID[k]], 
      sigmaE
    );
  }
}

generated quantities {
  real<lower=0> sigmaTotal;
  sigmaTotal = sqrt(sigmaE^2 + sigmaO^2 + sigmaOP^2);
}

Is it wrong? If this is wrong, how could we control the prior distribution on the groups-level variances?

jgabry commented 4 years ago

Oops, I was thinking about the regularization parameter (which is for the LKJ prior on the correlation matrix) and not the concentration parameter, sorry! You're right that concentration is for the dirichlet. Let me take a closer look at this and get back to you.

jgabry commented 4 years ago

how could we control the prior distribution on the groups-level variances

The shape and scale arguments to decov() will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri Is that not the case?

bgoodri commented 4 years ago

For terms like (1 | g), then the variance-covariance matrix is 1x1 and the Dirichlet prior on the variance proportion would put all of its mass on 1. At that point, the gamma prior with the given shape and scale pertains to the standard deviation of the intercept across levels of g.

On Tue, Nov 10, 2020 at 2:08 PM Jonah Gabry notifications@github.com wrote:

how could we control the prior distribution on the groups-level variances

The shape and scale arguments to decov() will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri https://github.com/bgoodri Is that not the case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724906278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKXPZY2HEG56XKNG4ALSPGFUFANCNFSM4TN5R7VA .

jgabry commented 4 years ago

For terms like (1 | g), then the variance-covariance matrix is 1x1

Ok right, thanks. So I was on the right track in my initial comment about the model only containing varying intercepts and not varying slopes, but then I confused myself!

stla commented 4 years ago

Sorry I don't understand. Here there are two (1|g) terms. The doc says that the Gamma prior is assigned on the sum of the two corresponding variances (up to a square root somewhere), and that the two proportions of these variances a priori follow the Dirichlet distribution. What am I missing? And then, how to set the prior distribution on these two variances, if I'm wrong?

jgabry commented 4 years ago

@bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.

bgoodri commented 4 years ago

If you have multiple lme4-style statements, like (1 | a) and (1 | b), then the two variance-covariance matrices are each 1x1 and independent of each other before seeing the data. If instead you had something like (1 + x | c), then the single variance-covariance matrix is 2x2 and then the prior on the correlation matrix and the two variance proportions comes into play, although they are jointly uniform by default.

On Tue, Nov 10, 2020 at 4:41 PM Jonah Gabry notifications@github.com wrote:

@bgoodri https://github.com/bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724983463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZH2KUIKZU4IE5VALSPGXP5ANCNFSM4TN5R7VA .

stla commented 4 years ago

So, if there are two (1|g) terms, rstanarm assigns the same Gamma prior distribution on the corresponding standard deviations?

bgoodri commented 4 years ago

You might be able to pass 2 shapes and / or scales, but the gamma distribution is used for both.

On Thu, Nov 12, 2020, 7:21 AM stla notifications@github.com wrote:

So, if there are two (1|g) terms, rstanarm assigns the same Gamma prior distribution on the corresponding standard deviations?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726044561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKX4AHFKSP6DKQSMQZTSPPHL5ANCNFSM4TN5R7VA .

stla commented 4 years ago

I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1)) but I didn't obtain expected results. How should one specify multiple Gamma priors?

bgoodri commented 4 years ago

That was supposed to work.

On Thu, Nov 12, 2020 at 9:59 AM stla notifications@github.com wrote:

I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1)) but I didn't obtain expected results. How should one specify multiple Gamma priors?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726130695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZMJNTTOFCUVW2EE3SPPZ6FANCNFSM4TN5R7VA .

stla commented 4 years ago

Yes, sorry, this works actually.

Both the R documentation and the online documentation do not mention this possibility.

stla commented 4 years ago

And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).

bgoodri commented 4 years ago

I would guess that it is in the same order that they appear in the formula, but I wouldn't be too surprised if lme4 changes the order.

On Thu, Nov 12, 2020 at 12:26 PM stla notifications@github.com wrote:

And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726221845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQF5NY5IBACSY66SUTSPQLDJANCNFSM4TN5R7VA .

stla commented 4 years ago

it is in the same order that they appear in the formula

No :)