Non-zero prior mean for fixed-effects

randel commented 4 years ago

Hello! For fixef.prior , it is in the format normal(sd = c(10, 2.5), cov, common.scale = TRUE) and it says Normal priors are constrained to have a mean of 0 - non-zero priors are equivalent to shifting covariates. Could you explain why non-zero priors are equivalent to shifting covariates?

Or how hard is it to allow non-zero fixed-effects priors? Do you have any clue how to incorporate the prior mean for fixed-effects? Thanks!

vdorie commented 4 years ago

Sorry that I've taken a while to respond but I'm afraid that I forgot most of the math that I would need to check to see if this is easy to implement.

Normal priors for fixed effects were implemented as pseudo data so they take very little extra time to compute. They're equivalent to shifting covariates just because those are both linear transformations and one can come up with linear model that would give the same predictions/have the same maximum. It's a bit messy with the hierarchical components, but the scale on those has always mattered, even in a non-Bayesian setting.

For priors on the fixed effects I think it is possible to implement non-zero normal priors - again, I'd have to do some digging - but you can also just use a t distribution. That prior is parameterized by its mean and if you set the degrees of freedom high enough will give an equivalent answer to a normal one.

Oh, and I'll take a look on the normal stuff but it might take me a few weeks.

randel commented 4 years ago

Thanks a lot for your response, @vdorie !

I am wondering how blme can incorporate informative priors. As shown in the following example, blmer is almost the same as lmer even using a strong prior on fixed effects, while MCMCglmm can reflect the impact of the informative prior. Or is there any misunderstanding?

library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))

# Fixed effects:
#   Estimate Std. Error t value
# (Intercept)  251.405      6.632  37.907
# Days          10.467      1.502   6.968

library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, cov.prior = NULL, REML = FALSE,
                     fixef.prior = t(df = 30, mean = 1, scale = c(.5, .5), common.scale = F)))
# Fixed effects:
#             Estimate Std. Error t value
# (Intercept)  245.402      6.783  36.178
# Days          10.642      1.503   7.082

library(MCMCglmm)
set.seed(1)
summary(fm3 <- MCMCglmm(Reaction ~ Days, random = ~us(1 + Days):Subject, data = sleepstudy, verbose = F,
                        prior = list(B = list(mu = rep(1, 2), V = diag(.5, 2)),
                                     G = list(G1 = list(nu = 0, V = diag(2))))))
# Location effects: Reaction ~ Days 
#             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
# (Intercept)    1.0486  -0.3701   2.2964     1000  0.13
# Days           1.0339  -0.3787   2.3946     1000  0.17

vdorie commented 4 years ago

If I had to guess, this has to do with the differences between optimization and sampling, as well as starting points. MCMCglmm likely isn't able to move from the area where the prior has high density to where the likelihood does, regardless of their relative sizes. For example, when I do

nu <- seq(10, 100, 5)
beta.1 <- sapply(nu, function(nu.i) {
  fixef.prior <- paste0("t(df = ", nu.i, ", mean = 1, scale = c(0.5, 0.5), common.scale = F)")
  suppressWarnings(
    blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, cov.prior = NULL, REML = FALSE,
          fixef.prior = fixef.prior)@beta[2])
  })

plot(nu, beta.1, type = "l")

the slope estimate floats around that of the likelihood until it suddenly jumps to 1. Often, the optimizer is complaining about non-convergence. I'm sure I could also get different results if I supplied different initial conditions.

If you have an informative prior that is on the scale of the new data, it should work just fine.

randel commented 4 years ago

This sounds interesting! So increasing the degree of freedom of the multivariate t distribution can increase the confidence of the prior.

Here is another example of informative prior of the covariance matrix of random effects. Do you have any suggestions to make the informative cov.prior to play a role in the estimation? Thanks!

library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))

# Random effects:
#   Groups   Name        Variance Std.Dev. Corr
# Subject  (Intercept)   565.48   23.780       
# Days                    32.68    5.717   0.08

library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
                     cov.prior = invwishart(df = 50, scale = diag(2))))

# Groups   Name        Variance Std.Dev. Corr
# Subject  (Intercept) 705.15   26.555       
# Days                  14.42    3.797   0.08

library(MCMCglmm)
set.seed(1)
summary(fm4 <- MCMCglmm(Reaction ~ Days, random = ~us(1 + Days):Subject, data = sleepstudy, verbose = F,
                        prior = list(G = list(G1 = list(nu = 50, V = diag(2))))))

#                                 post.mean l-95% CI u-95% CI eff.samp
# (Intercept):(Intercept).Subject     1.700   0.7299    3.367    562.9
# (Intercept):Days.Subject            1.880  -0.7914    4.976    517.3
# Days:Days.Subject                   8.791   3.9993   13.485   1000.0

vdorie commented 4 years ago

Those priors are more like indications that the new data come from a different distribution than the old one. When I try and fit them, I'm getting convergence warnings which indicates that the posterior is very flat over that region. That the algorithm ends up where it does is, again, a byproduct of the starting points.

library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))

library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
                     cov.prior = invwishart(df = 50, scale = diag(2))))

# move closer to prior
# note: no convergence warning
summary(fm3 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
                     cov.prior = invwishart(df = 50, scale = diag(2)),
                     start = c(0.1, fm1@theta[2:3])))

# Groups   Name        Variance Std.Dev. Corr
# Subject  (Intercept)  18.47    4.298       
#          Days         22.92    4.787   0.14
# Residual             903.46   30.058

# to show the issue, we curve the prior, likelihood, and posterior of the first
# component of the variance decomposition when holding the others
# at their MLEs

dev1 <-  lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE, devFunOnly = TRUE)
dev2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
              cov.prior = invwishart(df = 50, scale = diag(2)), devFunOnly = TRUE)

# here's how you would calculate the prior, although it's not quite
# right because it doesn't include the degrees of freedom adjustment
# for the residual variance
nu <- 50
Psi <- diag(2)
p <- ncol(Psi)

priorConstant <- 0.5 * nu * (determinant(Psi)$modulus[1] - p * log(2)) - 
                 0.25 * p * (p - 1) * log(pi)
for (i in seq_len(p))
  priorConstant <- priorConstant - lgamma(0.5 * (nu + 1 - i))

Psi.l <- t(chol(Psi))

prior <- function(x, nu, Sigma) {
  x.r <- chol(x)
  -(nu + p + 1) * sum(log(diag(x.r))) - 0.5 * sum(solve(x.r, Psi.l)^2)
}

wrapper.prior  <- function(x)
  sapply(x, function(x.i) {
    L <- matrix(c(x.i, fm1@theta[2], fm1@theta[2], fm1@theta[3]), 2)
    prior(tcrossprod(L), nu, Sigma)
  })

# this can be checked with the prior above and should be
# quite similar (with scaling)
wrapper.prior <- function(x)
  sapply(x, function(x.i) -0.5 * (dev2(c(x.i, fm1@theta[2:3])) -
                                  dev1(c(x.i, fm1@theta[2:3]))))

wrapper.like <- function(x)
  sapply(x, function(x.i) -0.5 * dev1(c(x.i, fm1@theta[2:3])))

wrapper.post <- function(x)
  sapply(x, function(x.i) -0.5 * dev2(c(x.i, fm1@theta[2:3])))

x <- seq(0.01, 2, length.out = 101)
y.like <- wrapper.like(x)
y.like <- exp(y.like - max(y.like)) # scale to stablize exp
y.like <- y.like / (sum(diff(x) * 0.5 * (y.like[-101] + y.like[-1]))) # integrate

y.prior <- wrapper.prior(x)
y.prior <- exp(y.prior - max(y.prior))
y.prior <- y.prior / (sum(diff(x) * 0.5 * (y.prior[-101] + y.prior[-1])))

y.post  <- wrapper.post(x)
y.post <- exp(y.post - max(y.post))
y.post <- y.post / (sum(diff(x) * 0.5 * (y.post[-101] + y.post[-1])))

plot(x, y.like, type = "l", ylim = range(y.like, y.prior, y.post))
lines(x, y.prior, lty = 2)
lines(x, y.post, col = "gray")

# posterior mode estimated when prior is flat in region of likelihood
abline(v = fm2@theta[1], xpd = FALSE)

# posterior mode when move starting point over to prior
abline(v = fm3@theta[1], xpd = FALSE, col = "gray")

randel commented 4 years ago

Thanks so much for your detailed explanation!

vdorie / blme

Non-zero prior mean for fixed-effects #10