Open randel opened 5 years ago
Sorry that I've taken a while to respond but I'm afraid that I forgot most of the math that I would need to check to see if this is easy to implement.
Normal priors for fixed effects were implemented as pseudo data so they take very little extra time to compute. They're equivalent to shifting covariates just because those are both linear transformations and one can come up with linear model that would give the same predictions/have the same maximum. It's a bit messy with the hierarchical components, but the scale on those has always mattered, even in a non-Bayesian setting.
For priors on the fixed effects I think it is possible to implement non-zero normal priors - again, I'd have to do some digging - but you can also just use a t distribution. That prior is parameterized by its mean and if you set the degrees of freedom high enough will give an equivalent answer to a normal one.
Oh, and I'll take a look on the normal stuff but it might take me a few weeks.
Thanks a lot for your response, @vdorie !
I am wondering how blme
can incorporate informative priors. As shown in the following example, blmer
is almost the same as lmer
even using a strong prior on fixed effects, while MCMCglmm
can reflect the impact of the informative prior. Or is there any misunderstanding?
library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 251.405 6.632 37.907
# Days 10.467 1.502 6.968
library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, cov.prior = NULL, REML = FALSE,
fixef.prior = t(df = 30, mean = 1, scale = c(.5, .5), common.scale = F)))
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 245.402 6.783 36.178
# Days 10.642 1.503 7.082
library(MCMCglmm)
set.seed(1)
summary(fm3 <- MCMCglmm(Reaction ~ Days, random = ~us(1 + Days):Subject, data = sleepstudy, verbose = F,
prior = list(B = list(mu = rep(1, 2), V = diag(.5, 2)),
G = list(G1 = list(nu = 0, V = diag(2))))))
# Location effects: Reaction ~ Days
# post.mean l-95% CI u-95% CI eff.samp pMCMC
# (Intercept) 1.0486 -0.3701 2.2964 1000 0.13
# Days 1.0339 -0.3787 2.3946 1000 0.17
If I had to guess, this has to do with the differences between optimization and sampling, as well as starting points. MCMCglmm likely isn't able to move from the area where the prior has high density to where the likelihood does, regardless of their relative sizes. For example, when I do
nu <- seq(10, 100, 5)
beta.1 <- sapply(nu, function(nu.i) {
fixef.prior <- paste0("t(df = ", nu.i, ", mean = 1, scale = c(0.5, 0.5), common.scale = F)")
suppressWarnings(
blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, cov.prior = NULL, REML = FALSE,
fixef.prior = fixef.prior)@beta[2])
})
plot(nu, beta.1, type = "l")
the slope estimate floats around that of the likelihood until it suddenly jumps to 1. Often, the optimizer is complaining about non-convergence. I'm sure I could also get different results if I supplied different initial conditions.
If you have an informative prior that is on the scale of the new data, it should work just fine.
This sounds interesting! So increasing the degree of freedom of the multivariate t distribution can increase the confidence of the prior.
Here is another example of informative prior of the covariance matrix of random effects. Do you have any suggestions to make the informative cov.prior to play a role in the estimation? Thanks!
library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))
# Random effects:
# Groups Name Variance Std.Dev. Corr
# Subject (Intercept) 565.48 23.780
# Days 32.68 5.717 0.08
library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
cov.prior = invwishart(df = 50, scale = diag(2))))
# Groups Name Variance Std.Dev. Corr
# Subject (Intercept) 705.15 26.555
# Days 14.42 3.797 0.08
library(MCMCglmm)
set.seed(1)
summary(fm4 <- MCMCglmm(Reaction ~ Days, random = ~us(1 + Days):Subject, data = sleepstudy, verbose = F,
prior = list(G = list(G1 = list(nu = 50, V = diag(2))))))
# post.mean l-95% CI u-95% CI eff.samp
# (Intercept):(Intercept).Subject 1.700 0.7299 3.367 562.9
# (Intercept):Days.Subject 1.880 -0.7914 4.976 517.3
# Days:Days.Subject 8.791 3.9993 13.485 1000.0
Those priors are more like indications that the new data come from a different distribution than the old one. When I try and fit them, I'm getting convergence warnings which indicates that the posterior is very flat over that region. That the algorithm ends up where it does is, again, a byproduct of the starting points.
library(lme4)
summary(fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE))
library(blme)
summary(fm2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
cov.prior = invwishart(df = 50, scale = diag(2))))
# move closer to prior
# note: no convergence warning
summary(fm3 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
cov.prior = invwishart(df = 50, scale = diag(2)),
start = c(0.1, fm1@theta[2:3])))
# Groups Name Variance Std.Dev. Corr
# Subject (Intercept) 18.47 4.298
# Days 22.92 4.787 0.14
# Residual 903.46 30.058
# to show the issue, we curve the prior, likelihood, and posterior of the first
# component of the variance decomposition when holding the others
# at their MLEs
dev1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE, devFunOnly = TRUE)
dev2 <- blmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy, REML = FALSE,
cov.prior = invwishart(df = 50, scale = diag(2)), devFunOnly = TRUE)
# here's how you would calculate the prior, although it's not quite
# right because it doesn't include the degrees of freedom adjustment
# for the residual variance
nu <- 50
Psi <- diag(2)
p <- ncol(Psi)
priorConstant <- 0.5 * nu * (determinant(Psi)$modulus[1] - p * log(2)) -
0.25 * p * (p - 1) * log(pi)
for (i in seq_len(p))
priorConstant <- priorConstant - lgamma(0.5 * (nu + 1 - i))
Psi.l <- t(chol(Psi))
prior <- function(x, nu, Sigma) {
x.r <- chol(x)
-(nu + p + 1) * sum(log(diag(x.r))) - 0.5 * sum(solve(x.r, Psi.l)^2)
}
wrapper.prior <- function(x)
sapply(x, function(x.i) {
L <- matrix(c(x.i, fm1@theta[2], fm1@theta[2], fm1@theta[3]), 2)
prior(tcrossprod(L), nu, Sigma)
})
# this can be checked with the prior above and should be
# quite similar (with scaling)
wrapper.prior <- function(x)
sapply(x, function(x.i) -0.5 * (dev2(c(x.i, fm1@theta[2:3])) -
dev1(c(x.i, fm1@theta[2:3]))))
wrapper.like <- function(x)
sapply(x, function(x.i) -0.5 * dev1(c(x.i, fm1@theta[2:3])))
wrapper.post <- function(x)
sapply(x, function(x.i) -0.5 * dev2(c(x.i, fm1@theta[2:3])))
x <- seq(0.01, 2, length.out = 101)
y.like <- wrapper.like(x)
y.like <- exp(y.like - max(y.like)) # scale to stablize exp
y.like <- y.like / (sum(diff(x) * 0.5 * (y.like[-101] + y.like[-1]))) # integrate
y.prior <- wrapper.prior(x)
y.prior <- exp(y.prior - max(y.prior))
y.prior <- y.prior / (sum(diff(x) * 0.5 * (y.prior[-101] + y.prior[-1])))
y.post <- wrapper.post(x)
y.post <- exp(y.post - max(y.post))
y.post <- y.post / (sum(diff(x) * 0.5 * (y.post[-101] + y.post[-1])))
plot(x, y.like, type = "l", ylim = range(y.like, y.prior, y.post))
lines(x, y.prior, lty = 2)
lines(x, y.post, col = "gray")
# posterior mode estimated when prior is flat in region of likelihood
abline(v = fm2@theta[1], xpd = FALSE)
# posterior mode when move starting point over to prior
abline(v = fm3@theta[1], xpd = FALSE, col = "gray")
Thanks so much for your detailed explanation!
Hello! For
fixef.prior
, it is in the formatnormal(sd = c(10, 2.5), cov, common.scale = TRUE)
and it saysNormal priors are constrained to have a mean of 0 - non-zero priors are equivalent to shifting covariates.
Could you explain why non-zero priors are equivalent to shifting covariates?Or how hard is it to allow non-zero fixed-effects priors? Do you have any clue how to incorporate the prior mean for fixed-effects? Thanks!