Conditional means priors

wlandau commented 11 months ago

We talked about specifying a joint prior on the placebo means or treatment effects, then translating it to an induced prior on any parameterization of regression coefficients. The literature calls this approach "conditional means priors" (CMPs), or BCJ priors after the authors of https://www.jstor.org/stable/2291571, and the topic comes up in prior elicitation. Sources:

E. Bedrick, R. Christensen, and W. Johnson. A new perspective on priors for generalized linear models. Journal of the American Statistical Association, 91(436):1450–1460, 1996. https://www.jstor.org/stable/2291571.
E. Bedrick, R. Christensen, and W. Johnson. Bayesian binomial regression: Predicting survival at a trauma center. The American Statistician, 51(3):211–218, 1997.
R. Christensen, W. Johnson, A. Branscum, T. Hanson. Bayesian Ideas and Data Analysis, CRC Press: Boca Raton. Section 8.4.2.1, p. 203. 2011.
G. Rosner, P Laud, W. Johnson. Bayesian thinking in biostatistics. CRC Press, 2021.

The method itself is exactly what you would expect: for a model of the form g(E(y|data)) = X*b for link function g(), and a vector of conditional means m such that g(m) = P*b given invertible matrix P, we specify independent priors on the components of m and simply consider the induced prior on b = P_inverse * g(m). The components of m become the true parameters of the model, and the regression coefficients in b are just transformed parameters. The full augmented model looks like:

y ~ MVN(g_inverse(X*b), Sigma)       # likelihood
    b = P_inverse * g(m) # transformation
    m ~ p(...)            # prior
    Sigma ~ LKJ(...)      # prior

Since the link g() is the identity function, the model collapses down to this:

y ~ MVN(Q*m, Sigma)      # likelihood
    m ~ p(...)           # prior
    Sigma ~ LKJ(...)     # prior

where Q = X * P_inverse. In other words, our true parameters are m and Sigma. When we do MCMC, we draw joint posterior samples of (m, Sigma). This seems to be the most feasible (and recommended) way to handle CMPs. The alternative is to analytically transform the joint prior on m to a correlated joint prior on b beforehand, which seems extremely messy and error-prone even in a specialized case like ours. Then we would need to hack the Stan code of brms to implement a custom likelihood family because of the correlated priors on the components of b, which is also be extremely messy and error-prone. So unfortunately, I think CMPs force us away from brms.

wlandau commented 11 months ago

If we want to stick with brms for informative priors, I think it would be reasonable to change the parameterization to accommodate whatever kind of informative prior the user wants.

wlandau commented 11 months ago

Given that CMPs do not seem feasible with brms, we as a group decided to handle them in an alternative MMRM package. (Although I will first ask around if they are possible in brms somehow.)

wlandau commented 11 months ago

Just posted https://discourse.mc-stan.org/t/conditional-means-priors-in-brms/32160 to make sure we are not missing something.

andrew-bean commented 11 months ago

I may be missing something, but what prevents us from just fitting the reparametrized model

y ~ MVN(X * inverse(P) * m, Sigma)

where we just have a different fixed-effects design matrix W = X * inverse(P) instead of X? In the situation where X consists only of main effects and interactions for time and treatment, and we would rather put priors on the cell means m instead of the regression coefficients, W would be dummy variables for the cell means. This can be done in brms.

Another interesting brms feature is around custom contrasts. If you do

contrasts(my_factor) <- my_contrast_matrix
brm(response ~ my_factor, ...)

my understanding is it actually results in a reparametrized model with a different design matrix. I don't fully understand how brms handles this, but maybe someone on Stan forums will, or I'll try to investigate more.

wlandau commented 11 months ago

Wow, you're right! Brilliant!

I think I missed it before because in the more general formulation for GLMs, we would have to go through a link function, and that would make things more difficult unless we formally transform the components of m accordingly.

So yes, I think it would be possible to implement this in brms.mmrm. We would need to manually construct the design matrix, but that seems doable.

Given @andrew-bean's workaround, what are everyone's throughs about https://github.com/RConsortium/brms.mmrm/discussions/26?

wlandau commented 11 months ago

@andrew-bean, if you post https://github.com/RConsortium/brms.mmrm/issues/40#issuecomment-1642741668 to https://discourse.mc-stan.org/t/conditional-means-priors-in-brms/32160, I will mark it as the solution.

andrew-bean commented 11 months ago

@wlandau ok sounds fine, thanks. Done.

On the same topic: one key difference between the two parametrizations is that cell means are correlated a-priori under the usual prior (e.g. the mean response for a treatment group at two different timepoints are correlated due to the shared main effect of treatment).

For me that correlation seems desirable actually, and it matters for the way historical data would inform a model. With independent priors on the cell means, historical data from one cell would influence the posterior only for that cell. It seems undesirable, because e.g. if you handed me data showing a drug was ineffective at 4 and 8 weeks in a past study, for me it should change my prior for that drug at 12 weeks in a future study.

chstock commented 11 months ago

Thanks, @andrew-bean and @wlandau, this sounds smart! With the proposed solution, b isn't an explicit parameter in the model anymore, however, we would probably want to report the model results in the typical MMRM fashion (i.e. including b). In Stan we might use a transformed parameters {} block. I wonder how we would obtain b when working with brms.

wlandau commented 11 months ago

With the proposed solution, b isn't an explicit parameter in the model anymore, however, we would probably want to report the model results in the typical MMRM fashion (i.e. including b). In Stan we might use a transformed parameters {} block. I wonder how we would obtain b when working with brms.

I think we could take the posterior samples of m and transform them into samples of b using inverse(P) * m. This would be straightforward as a post-processing step after brms finishes running the MCMC.

On the same topic: one key difference between the two parametrizations is that cell means are correlated a-priori under the usual prior (e.g. the mean response for a treatment group at two different timepoints are correlated due to the shared main effect of treatment).

Yes, whereas the usual notion of conditional means priors seems to assume prior independence among the components of m. (At least that is how we would probably need to do it within brms.)

For me that correlation seems desirable actually, and it matters for the way historical data would inform a model. With independent priors on the cell means, historical data from one cell would influence the posterior only for that cell. It seems undesirable, because e.g. if you handed me data showing a drug was ineffective at 4 and 8 weeks in a past study, for me it should change my prior for that drug at 12 weeks in a future study.

So then maybe cell means are not always the right choice for m. We might want them to be contrasts of b which

Are independent, and
Are intuitive and useful with respect to assigning informative priors.

(1) not only justifies modeling assumptions, it could also ensure we have nice well-behaved posterior geometry and efficient sampling in HMC. Might be hard to combine with (2), but it's worth taking time to think about which parameterizations would make sense for both.

andrew-bean commented 11 months ago

Was playing with some code to compare the two approaches (just simulating from priors here). Mostly this is to investigate the brms feature I mentioned about custom contrasts. See fit_cell_contrasts. Though I'm not sure how this feature could be useful to us.

Tacked on an illustration of backtransforming m samples to b in the way Will described.

devtools::load_all()
library(brms.mmrm)
library(dplyr)
library(tibble)
library(brms)
library(ggplot2)
library(bayesplot)
set.seed(0L)

n_group <- 3
n_patient <- 2
n_time <- 4

observations <- brm_simulate(
  n_group = 3,
  n_patient = 2,
  n_time = 4
)$data %>%
  mutate(
    group = paste("group", group),
    patient = paste("patient", patient),
    time = paste("time", time)
  ) %>%
  mutate(across(-response, factor))

cells <- observations %>%
  select(group, time) %>%
  distinct() %>%
  mutate(cell = factor(1:n(), 1:n()))

observations <- observations %>%
  left_join(cells, c("group", "time"))

# design matrix for E(y) = X * b
X <- model.matrix(~ group * time, data = observations)

# replace simulated responses using fixed values for b, Sig
b <- c(0, 2, 4, 0.4, 0.8, 1.2, 0, 0, 0, 0, 0, 0)
Sig <- matrix(0.5, nrow = n_time, ncol = n_time)
diag(Sig) <- 1
Z <- matrix(rnorm(nrow(observations)), ncol = n_time)
eps <- as.numeric(Z %*% chol(Sig))
observations$response <- X %*% b + eps
plot_obs <- ggplot(observations, aes(x = time, color = group)) +
  geom_path(aes(y = response, group = patient))

# cell means m = inverse(M) * b under model E(y) = X * b
Minv <- model.matrix( ~ group * time, data = cells)
m <- Minv %*% b
M <- solve(Minv)
W <- X %*% M

# overlay true cell means
plot_obs + geom_path(aes(y = m, group = group), data = cells, linewidth = 2)

# three models:
# (sampling from the prior)

# 1. using a regression model for the mean
# (compound symmetry assumption for simplicity)
fit_reg <- brm(
  response ~ group * time + cosy(gr = patient),
  observations,
  prior = c(
    prior("normal(0, 1)", class = "b"),
    prior("normal(0, 1)", class = "Intercept"),
    prior("uniform(0, 1)", class = "cosy")
  ),
  sample_prior = "only"
)

# 2. model parametrized with independent cell means
fit_cell <- brm(
  response ~ 0 + cell + cosy(gr = patient),
  observations,
  prior = c(
    prior("normal(0, 1)", class = "b"),
    prior("uniform(0, 1)", class = "cosy")
  ),
  sample_prior = "only"
)

# 3. cell-mean parametrization, but using custom contrasts
observations$cell_contrasts <- observations$cell
contrasts(observations$cell_contrasts) <- Minv[,-1]
fit_cell_contrasts <- brm(
  response ~ cell_contrasts + cosy(gr = patient), # doesn't handle contrasts the same way if intercept is omitted
  observations,
  prior = c(
    prior("normal(0, 1)", class = "Intercept"),
    prior("normal(0, 1)", class = "b"),
    prior("uniform(0, 1)", class = "cosy")
  ),
  sample_prior = "only"
)

# Design matrix for the "regression" approach makes sense
X_reg <- standata(fit_reg)$X
X_reg <- X_reg[as.character(1:nrow(X)),]
all(X_reg == X)

# As does the "cell-means" approach
X_cell <- standata(fit_cell)$X
X_cell <- X_cell[as.character(1:nrow(X)),]
all(X_cell == W)

# brms handles the custom-contrasts model in an interesting way
X_cell_contrasts <- standata(fit_cell_contrasts)$X
X_cell_contrasts <- X_cell_contrasts[as.character(1:nrow(X)),]
# the design matrix matches X; not close to W
all(X_cell_contrasts == W)
all(X_cell_contrasts == X)

# How correlated are the priors induced on the cell means
# across time points for one group?
newdata <- filter(cells, group == "group 1") %>%
  mutate(patient = "foo") # otherwise the next line complains

newdata$cell_contrasts <- newdata$cell
contrasts(newdata$cell_contrasts) <- Minv[,-1]

lps <- lapply(
  setNames(list(fit_reg, fit_cell, fit_cell_contrasts),
           c("regression", "cell-mean", "cell-mean w/ custom contrasts")),
  posterior_linpred,
  newdata = newdata
)

# the cell-mean model has uncorrelated linear predictors
# regression does not
lapply(lps, function(x) round(cor(x), 2))
pairs_plots <- lapply(lps, function(x) bayesplot::mcmc_pairs(posterior::as_draws_matrix(x)))
pairs_plots[["regression"]]
pairs_plots[["cell-mean"]]

# transforming posterior draws for m to draws for b
m_draws <- brms::as_draws_matrix(fit_cell, variable = "b", regex = TRUE)
b_draws_trans <- m_draws %*% t(M)

# these are now correlated
cov(b_draws_trans) %>% round(1)
M %*% t(M)

# and have slightly different marginal distributions than under the X*b model
b_draws <- brms::as_draws_matrix(fit_reg, variable = "b", regex = TRUE)

plots <- lapply(list(b_draws_trans, b_draws), mcmc_areas)
bayesplot_grid(plots = plots, xlim = c(-10, 10),
               titles = c("Fit E(y) = W * m with m ~ N(0,1) then transform to b",
                          "Fit E(y) = X * b with b ~ N(0,1)"))

wlandau commented 11 months ago

This is really helpful to walk through, @andrew-bean!

I really like the way you calculate Minv from model.matrix(). Really elegant and easy to do correctly.

So for fit_cell_contrasts: it looks like if you nominally regress on m but give it contrasts that map to b, then the model will actually estimate b as parameters. In other words, it actually estimates the contrasts as parameters instead of the individual factor levels. Do I have that right? If so, and if it is possible to give contrasts to the entirety of group * time, would conditional means priors be possible using the contrast technique in the reverse direction? It's an interesting connection.

On the other hand, I would find it clearer to manually construct the model matrix and supply it directly to brm() (should be possible to do through the data). We might have to do that anyway, e.g. if we are adjusting for nuisance covariates and need to manually set a reference level so the components of m are not implicitly conditional on a subset of the data (e.g. at a level of a nuisance factor, c.f. https://github.com/RConsortium/brms.mmrm/discussions/24#discussioncomment-5928853).

wlandau commented 11 months ago

So along the lines of https://github.com/RConsortium/brms.mmrm/issues/40#issuecomment-1643076271:

For me that correlation seems desirable actually, and it matters for the way historical data would inform a model. With independent priors on the cell means, historical data from one cell would influence the posterior only for that cell. It seems undesirable, because e.g. if you handed me data showing a drug was ineffective at 4 and 8 weeks in a past study, for me it should change my prior for that drug at 12 weeks in a future study.

and https://github.com/RConsortium/brms.mmrm/issues/40#issuecomment-1643910109:

So then maybe cell means are not always the right choice for m. We might want them to be contrasts of b which
1. Are independent, and
2. Are intuitive and useful with respect to assigning informative priors.

@andrew-bean and others: if we choose something other than cell means for m, what would you propose?

weberse2 commented 11 months ago

I have not absorbed all content above, but a good read of factor contrasts in R is this document:

https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

for MMRMs I think that the MASS::contr.sdif contrasts are a really good fit. These are forward differences and allow to define the parameters to mean things like the difference in the visit 3 vs visit 2 being a parameter (consecutive visits). Defining contrasts is a very standard thing in R, which is why I consider this as a very attractive way to setup a given parametrisation.

wlandau commented 11 months ago

Thanks for the reference, @weberse2. I have always had trouble with contrast notation in R, but maybe I won't after reading that article.

wlandau commented 11 months ago

Reading https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf, I am reminded of how uncertain I feel when I use the contrast interface and lme4-like formulas:

Did I correctly set the contrast matrix, or do I need to invert or transpose it?
Am I certain I trust the mapping between the levels of the factor and the columns (rows?) of the contrast matrix?
Am I completely confident in how the intercept is handled? Did I put a -1 index in the right place, or remember to include a + 0 in the formula? And if I am looking back at old code and see a nameless -1, what does that do to my mental picture of the current model matrix?
At what point can I trust the formulas and contrast matrices instead of repeatedly calling model.matrix() to check my work?

I may be in the minority, but even after 10+ years of using R to do statistics, I still find it hard to trust that the contrast/formula machinery is faithfully representing a model I have in mind. It takes a lot of mental arithmetic and cognitive load just to make sure I am not making obvious errors.

I would prefer to set up the regression coefficients in a more explicit and pedantic way, with transparent/verbose assurances, and strict guardrails (in our case, simple objects passed to well-documented fit-for-purpose formal arguments).

weberse2 commented 11 months ago

I do understand your concerns with the R contrast system, which is not super intuitive. It's just that a lot of R works in this way and going against this established system is a choice to make. However, you are probably right in that the majority of people are anyway not familiar with these conventions in R and it is hence better to create your own one.

wlandau commented 10 months ago

Taking a step back and revisiting https://github.com/openpharma/brms.mmrm/issues/40#issuecomment-1643995915, I wonder if explicit CMPs are even necessary for our identity link function. As @andrew-bean observed, the model matrix just collapses back down into a different model matrix. In that situation, I would think it is more efficient to choose whatever prior-friendly parameterization makes sense (for example, cell means) and then compute quantities like the intercept as transformed parameters if needed.

Of course, if we want our priors on fixed effects to be correlated, that might complicate things. But are correlated priors be necessary to consider? Seems like a challenge to account for correlations between different time points in the prior, especially if we need to elicit that prior. And we already have a flexible correlation matrix on the residuals.

weberse2 commented 10 months ago

Well... if you are going for non-informative priors then cell means are easy and simple to set up... in practice I'd much prefer to put a prior on the difference in the response from visit to visit. This isn't big usually and it creates immediately a correlation structure which is appropriate for the problem at hand in many cases.

wlandau commented 2 months ago

Coming back to this after almost a year, I think it would help to list all the major cases we care about for assigning priors. I can think of priors on:

The difference in response from visit to visit.
Cell means.
Cell means for placebo response, but effects for treatment.

Any others?

wlandau commented 2 months ago

I just added a prototype for time differences: https://github.com/openpharma/brms.mmrm/discussions/92. Would be great to discuss this and #89 in our meeting tomorrow.

wlandau commented 2 months ago

For this issue, we seem to be heading away from conditional means priors and towards reparameterization in general. In the original formulation by Bedrick et al., CMPs are about using elicited priors on the response scale to induce priors on the model coefficients. We don't actually need that here. All we need is a model that has a parameterization amenable to informative priors. Beyond that point, any deterministic function on posterior samples is straightforward.

wlandau commented 2 months ago

Moving this to a discussion in favor of #96.

openpharma / brms.mmrm

Conditional means priors #40