paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Multi-membership group-level structure #130

Closed paul-buerkner closed 7 years ago

paul-buerkner commented 8 years ago

In MCMCglmm multi-membership can be modeled via tha mult.memb function. Using this implementation in brms, however, would probably not be optimal as it leads to sparse matrices that are not yet handled efficiently in Stan. Accordingly, we will likely need a different implementatin and also have to think of a nice formula syntax to specify multi-membership structures.

holgersr commented 7 years ago

Hi Paul,

thank you for your great work that results in so very helpful software, such as brms.

I actually started working with brms and stan this morning, and I am also not too experienced with github (so please escuse any stupid mistakes), but I think I just found an (unelegant) workaround to use a multi-membership structure with stan (which I compleely based on the result from the stancode function).

Maybe (and hopefully) this is in some way helpful for you?! I tried to comment the steps; please let me know if you have any problems/questions/suggestions.

Thank you again!

With kind regards, Holger

## Multi-membership structure using brms
## Date: December 1, 2016

library("brms")
mm_Z <- function(dat, id_a, id_b){
  aux_a <- dat[, names(dat) == id_a]
  aux_b <- dat[, names(dat) == id_b]
  aux_check <- any(c(!is.factor(aux_a), !is.factor(aux_b)))
  if(aux_check){
    stop("id_a and id_b need to be factors.")
  }
  ## Give both the set of all unique levels:
  aux_levels <- sort(unique(c(levels(aux_a), levels(aux_b))))
  aux_a <- factor(as.character(aux_a), levels = aux_levels)
  aux_b <- factor(as.character(aux_b), levels = aux_levels)
  ## table(aux_a, dat[, names(dat) == id_a]); table(aux_b, dat[, names(dat) == id_b]) ## Check: ok!
  Z_a <- do.call(rbind, lapply(aux_a, FUN = function(x, aux){as.numeric(x == aux)}, aux = aux_levels))
  Z_b <- do.call(rbind, lapply(aux_b, FUN = function(x, aux){as.numeric(x == aux)}, aux = aux_levels))
  ## require("dummies"); all(Z_a - dummy(aux_a, drop = FALSE) == 0); all(Z_b - dummy(aux_b, drop = FALSE) == 0) ## Check: ok!
  Z <- Z_a + Z_b
  colnames(Z) <- aux_levels
  return(Z)
}

## Simulate multi-membership data:
set.seed(123)
N <- 2000
n <- length(letters)
d <- t(combn(letters[1:n], m = 2))
d <- d[sample(1:nrow(d), replace = T, size = N), ]
table(apply(d, MAR = 1, FUN = paste, collapse = "_"))

d <- data.frame(ID_a = d[, 1], ID_b = d[, 2], stringsAsFactors = TRUE)

Z <- mm_Z(dat = d, id_a = "ID_a", id_b = "ID_b")
beta_0 <- 0.5 ## true intercept
beta_1 <- 1 ## true coefficient of covariate x
x <- sample(c(0, 1), replace = T, size = N)
u <- rnorm(ncol(Z), sd = 0.2)
y <- beta_0 + beta_1*x + Z %*% u + rnorm(N, sd = 0.5)

d <- data.frame(y = as.numeric(y), x = x, ID_a = d$ID_a, ID_b = d$ID_b)

## A misspecified random structure as auxiliary object:
m <- brm(formula = y ~ x + (1 | ID_a) + (1 | ID_b),
            data = d, family = gaussian(),
            prior = c(set_prior("normal(0,100)", class = "b"),
                      set_prior("cauchy(0,2)", class = "sd")),
            warmup = 1000, iter = 2000, chains = 1,
            control = list(adapt_delta = 0.95))
## stancode(m) ## Use the as template for the following
## Only few changes should be enough to get to a multi-membership structure?!
code <- "// This Stan code was generated with the R package 'brms'. 
// We recommend generating the data with the 'make_standata' function. 
functions { 
} 
data { 
int<lower=1> N;  // total number of observations 
vector[N] Y;  // response variable 
int<lower=1> K;  // number of population-level effects 
matrix[N, K] X;  // population-level design matrix 
// data for group-level effects of ID 1 
int<lower=1> J_1[N]; 
int<lower=1> N_1; 
int<lower=1> M_1; 
vector[N] Z_1_1; 
// data for group-level effects of ID 2 
int<lower=1> J_2[N]; 
//int<lower=1> N_2; 
//int<lower=1> M_2; 
//vector[N] Z_2_1; 
int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
int Kc; 
matrix[N, K - 1] Xc;  // centered version of X 
vector[K - 1] means_X;  // column means of X before centering 
Kc = K - 1;  // the intercept is removed from the design matrix 
for (i in 2:K) { 
means_X[i - 1] = mean(X[, i]); 
Xc[, i - 1] = X[, i] - means_X[i - 1]; 
} 
} 
parameters { 
vector[Kc] b;  // population-level effects 
real temp_Intercept;  // temporary intercept 
real<lower=0> sigma;  // residual SD 
vector<lower=0>[M_1] sd_1;  // group-level standard deviations 
vector[N_1] z_1[M_1];  // unscaled group-level effects 
// vector<lower=0>[M_2] sd_2;  // group-level standard deviations 
// vector[N_2] z_2[M_2];  // unscaled group-level effects 
} 
transformed parameters { 
// group-level effects 
vector[N_1] r_1_1; 
// group-level effects 
// vector[N_2] r_2_1; 
r_1_1 = sd_1[1] * (z_1[1]); 
// r_2_1 = sd_1[1] * (z_2[1]); 
} 
model { 
vector[N] eta; 
eta = Xc * b + temp_Intercept; 
for (n in 1:N) { 
eta[n] = eta[n] + r_1_1[J_1[n]] * Z_1_1[n] + r_1_1[J_2[n]] * Z_1_1[n]; 
} 
// prior specifications 
b ~ normal(0,100); 
sigma ~ student_t(3, 0, 10); 
sd_1 ~ cauchy(0,2); 
z_1[1] ~ normal(0,1); 
//sd_2 ~ cauchy(0,2); 
//z_2[1] ~ normal(0,1); 
// likelihood contribution 
if (!prior_only) { 
Y ~ normal(eta, sigma); 
} 
} 
generated quantities { 
real b_Intercept;  // population-level intercept 
b_Intercept = temp_Intercept - dot_product(means_X, b); 
} "
dat <- standata(m)
dat$N_1 <- dat$N_1 + 1 ## Because of row-wise alphabetical order by combn(), the last letter is never included in the first column

## Fit the multi-membership structure model:
fit1 <- stan(model_code = code, iter = 2000, verbose = TRUE, data = dat, chains = 1, 
             control = list(adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20))
hat_u <- summary(fit1)$summary[31:56, 1]
plot(u, hat_u); grid(); abline(a=0, b=1)

## Refit the misspecified random structure as auxiliary object in order to have two stanfit objects:
fit2 <- stan(model_code = stancode(m), iter = 2000, verbose = TRUE, data = standata(m), chains = 1, 
             control = list(adapt_delta = 0.999, stepsize = 0.001, max_treedepth = 20))

## Compare:
summary(fit1)$summary[1:4, ]
summary(fit2)$summary[c(1:4, 30), ]

hat_ub <- rep(0, 26)
hat_ub[1:25] <- summary(fit2)$summary[56:80, 1]
hat_ub[2:26] <- hat_ub[2:26] + summary(fit2)$summary[81:105, 1]

plot(u, hat_u, xlim = range(c(u, hat_u)), ylim = range(c(u, hat_u))); grid(); abline(a=0, b=1)
points(u, hat_ub, col = 2)

mean((u - hat_u)^2)
mean((u - hat_ub)^2)

## It would be cool to see the difference by LOO(), as the following doesn't seem to work:
## - gives warnings, and 
## - the result is surprising.
library("loo")
log_lik1 <- extract_log_lik(fit1, parameter_name = "lp__")
loo1 <- loo(log_lik1)
loo1 ## warning; I don't trust this result.

log_lik2 <- extract_log_lik(fit2, parameter_name = "lp__")
loo2 <- loo(log_lik2)
loo2 ## warning; I don't trust this result.
paul-buerkner commented 7 years ago

Hi Holger,

thanks the detailed code! I will look at it in details within the next few days. :-)

holgersr commented 7 years ago

Hi Paul,

thanks for your reply!

Of course, the last part on the Leave-one-out Information Criterion (LOOIC) was completely wrong, sorry.

If you change in the upper stan code the last generated quantities part to:

generated quantities { real b_Intercept; // population-level intercept vector[N] log_lik; b_Intercept = temp_Intercept - dot_product(means_X, b); for (n in 1:N) log_lik[n] = normal_lpdf(Y[n] | Xc[n]*b + temp_Intercept, sigma); }

then running the lines:

library("loo") log_lik1 <- extract_log_lik(fit1, parameter_name = "lp__") loo1 <- loo(log_lik1) loo1

leads to:

Computed from 1000 by 2000 log-likelihood matrix Estimate SE elpd_loo -1720.5 37.6 p_loo 32.3 1.1 looic 3441.0 75.2 All Pareto k estimates OK (k < 0.5)

which makes sense now in comparison to:

LOO(m, re_formula = NA) LOOIC SE 3487.42 81.83

-> In my intuition, specifying the correct grouping structure should yield some benefit also in terms of the beta coefficients, and it seems to do this when quantified on the basis of LOOIC including only beta information.

Including the same generated quantities part to the returned code from stancode(m) leads to:

Computed from 1000 by 2000 log-likelihood matrix Estimate SE elpd_loo -1740.3 40.8 p_loo 35.8 1.2 looic 3480.6 81.6 All Pareto k estimates OK (k < 0.5)

I assume that the rather small difference in comparison to LOO(m, re_formula = NA) just stems from a different randomly drawn posterior sample?!

paul-buerkner commented 7 years ago

Hi Holger,

I see from the Stan model that r_1_1[J_1[n]] * Z_1_1[n] + r_1_1[J_2[n]] * Z_1_1[n] is the important part, so that observation n belongs to (in this example) two groups each equally weighted (right?). I am not an expert in multimembershp models so I have to ask a few questions / points to discuss.

(a) I think we somehow have to incorporate the weighting. In your case, we implicitely use the weight 1 two times, but I think we need to generalized this to arbitrary weights. Also, I think it is common that the weights sum to one, or am I mistaken?

(b) We have to find a syntax for multimembership models that fits nicely into the existing syntax. My initial idea was to write in a group-level term (... | weight1 * group1 + weight2 * group2 + <optionally more groups>) or use : instead of *. Do you think that makes sense?

(c) Is it reasonable to have random slopes (not just a random intercept) in a multi-membership term?

holgersr commented 7 years ago

Hi Paul,

I approach multi-membership models always from dyadic observations, such as observing the outcome of two people doing something. For example, they need to solve a certain task together, and as the outcome you observe how long they needed to fulfill this task. In the assumption underlying a multi-membership structure, the outcome depends only on the sum of the two single contributing abilities from both persons, and so there is only one single grouping-term, where the coefficients measure the ability of a person to contribute to solving the task. A multi-membership term is then the technical way of including this into the linear predictor of a regression model, and a multi-membership term needs to set up a indicator matrix Z that might have several 1 values in one single line, and not just only one 1 at the element that belongs to the level that was observed for observation n.

So, I would say:

(a) While the more general weighting would not be needed for the above example, there might be scenarios where this is usefull (several persons from different cities doing something together, for example). But I think that the weights don't necesarrily need to sum up to 1, so they are rather absolte, not relative frequencies of how often to incorporate one anility into the model's linear predictor?! (b) Great, that would be an optimal solution, I think! (I would prefer the * syntax over the : syntax) (c) Yes, I think that the multi-membership structure is a quite general grouping scheme, and so it should also translate to other more complex situations than only varying (random) intercepts.

Thank you again for your work! And have a nice day!

paul-buerkner commented 7 years ago

(a) Ok, I guess I will still standarize the weights to sum to one internally so that random effects stay on the same scale.

(b) With respect to the internal handling of the stats package for the symbols * and : in formulas, I guess : is the canonical solution, so likely I will stick to it (not yet decided though).

(c) Makes sense.

Hopefully, I will find the time to implement this within the next one or two weeks.

paul-buerkner commented 7 years ago

Multi-membership models are now more or less implemented on github. They just need some documentation and unit tests.

You can fit a model as follows (using the example of @holgersr):

m <- brm(formula = y ~ x + (1 | mm(ID_a, ID_b)),
         data = d, family = gaussian(),
         prior = c(set_prior("normal(0,100)", class = "b"),
                   set_prior("cauchy(0,2)", class = "sd")),
         warmup = 1000, iter = 2000, chains = 1,
         control = list(adapt_delta = 0.95))

If you want members to be weighted unequally, replace mm(ID_a, ID_b) with mm(ID_a, ID_b, weights = cbind(w1, w2)), where w1 and w2 are variables specifiying the weights.

Multimembership terms are fully compatible with all other features brms offers.