tmalsburg / MCMCglmm-intro

A tutorial showing how to set up a Bayesian "lmer" model using MCMCglmm
56 stars 12 forks source link

Specify priors in MCMCglmm #1

Closed mikeyEcology closed 5 years ago

mikeyEcology commented 5 years ago

Hi, I'm new to this package; I usually use Stan for Bayesian models, but I'm using a lot of data and hoping I can get the models to run faster in MCMCglmm. I have read the course notes as well as the readme on this page. They are really helpful and great descriptions, but I still do not understand how to set priors. I have a simple mixed effects model, and here is some example code.

library(MCMCglmm)
# inverse logit function for simulation
inv.logit <- function(x){
  exp(x)/(exp(x)+1)
}
# simulate some data
set.seed(123)
n <- 1000
covariates <- replicate(3, rnorm(n, 0, .5))
X <- cbind(rep(1, length(covariates[,1])),covariates)
colnames(X) <- c('int', 'X1', 'X2', 'X3')
coefs <- c(1, -1, -.5, 0.3)
resp <- X %*% coefs
psi <- inv.logit(resp)
ind <- rep(1:10, floor(n/10)) # assigning individuals for random effect, but it's not important
n_inds <- length(unique(ind))
y <- rbinom(n, 1, psi)
df <- data.frame(ind, y, X)

# priors: I really don't know what I'm doing here
prior1 <- list(R = list(V = 2, n = 1, fix=1),
               G = list(G1 = list(V = diag(3), n = 3)))

# run the model
glmm <- MCMCglmm(y ~ X1 + X2 + X3,
                 random= ~ us(1 + ind), # having random slope and random intercept for ind
                 family = "categorical",
                 data = df,
                 prior=prior1
                 )

If I were running this in Stan, I would set priors for each fixed effect covariate Beta ~ normal(0, sigma_beta) and the hyperprior on each sigma_beta ~ gamma(2,1). (Although, I'd also be happy just setting the prior as Beta ~ normal(0,100) or something like this.) I would use similar priors for the random effects. I understand that MCMCglmm is more limited in distributions, but I really don't understand the notation. Is there somewhere I can find a definition of what is exactly meant by each of the values that goes into the prior (e.g., V, n, alpha, ...) and how these values correspond to what we would write in a full model description of the priors? Or is someone willing to explain it to more simple minded people like me?

tmalsburg commented 5 years ago

Hi Mickey, I haven't used MCMCglmm since I wrote this tutorial. I completely switched to Stan and I'm happy with it. So I'm afraid I can't help you with the prior specifications. One drawback of MCMCglmm is indeed that incredibly poor documentation. That alone was reason enough for me to switch.

May I ask why you think that MCMCglmm might be faster? Time per sample is probably better since MCMCglmm uses Gibss (if I remember correctly), but time per effective sample should on average be better with HMC in Stan. If sampling is slow in Stan this can usually be addressed by reparameterizing and choosing better priors. I don't know what your data is but normal(0, 100) sounds very vague. I'd use prior-predictive checks to find priors that produce realistic looking data and than work with those priors. In my experience, this considerably speeds up sampling. Michael Betancourt wrote about how to do this, e.g. here. Hope that helps.