Marginalized regression coefficients

StaffanBetner commented 4 years ago

This is sort of related to #489, but expands it a bit.

Let's say we have a model for a mean parameter, something like $\mu = g^{-1}(X\beta + Z\nu)$, where $\nu$ is varying intercepts and/or slopes (e.g.). In that case, it would be useful to marginalize out the varying effects (maybe just some, all or just one) to get "fixed effects" on a population level.

The question is how to implement this in a Bayesian context. In Hedeker et al. (2017) they suggest a frequentist approach with Gauss–Hermite quadrature for approximating the integral (standard errors based on the delta method), but they have some relevant derivations applicable to a Bayesian case. In summary we would need to integrate out the varying effects for each observation in combination with each posterior draw of the parameters, and then for each posterior draw use the marginalized fitted response as a response for a marginal model.

In a case that is link-linear (e.g. a logistic regression) I suppose we can use the inverse link on the marginalized fitted response and just use OLS to obtain marginalized regression parameters, and iterate that over all posterior draws. This could be done in post–processing.

However, I think a more general way would be to estimate the marginal model directly in the Stan model together with the conditional model, by letting Stan handle the integration after the estimation of the conditional parameters, and specify a model like $g(\theta{marginal}) = f{marginal}(x)$ and maybe inherit the applicable priors from the conditional model. But I am not sure whether Stan is capable of that.

References:

Hedeker, D., du Toit, S. H. C., Demirtas, H., & Gibbons, R. D. (2017). A note on marginalization of regression parameters from mixed models of binary outcomes. Biometrics, 74(1), 354–361. doi:10.1111/biom.12707

paul-buerkner commented 4 years ago

I agree that computing predictions marginalized over certain terms, such as random effects is a worthwhile endeavour that I will tackle in the future of brms.

wds15 commented 4 years ago

Maybe start with gaussian likelihoods where this can be done analytically?

StaffanBetner commented 3 years ago

@wds15 It is dependent on the link function, not really the conditional distribution of the response/distribution of the random effect.

@paul-buerkner There is a fresh paper, Gory et al. (2020), providing some interesting results for "generalized linear mixed models adjusted for marginal interpretability". The idea is to calculate adjustment factors (interpretable sort of as the means of the random factors for each group), so that a GLMM with adjustment factors is reformulated as

where d_i^T a_i is the adjustment factors. Some useful results in the paper is e.g. that the adjustment factor always exists when the link function is an inverse CDF (such as the logit, probit, etc.) For the log link it only exists when the distribution of the random effects have a defined moment generating function (so the t-distribution is out of question in that case), but an analytical solution exists in that case (see proposition 3).

The question is whether something like this should be done in Stan, or in post-processing. Probably Stan since it depends on lots and lots of integrations (number of draws times number of observations) as the equation above have to be solved for each combination of draw and observation.

References: Gory, J. J., Craigmile, P. F., & MacEachern, S. N. (2020). A class of generalized linear mixed models adjusted for marginal interpretability. Statistics in Medicine, sim.8782. https://doi.org/10.1002/sim.8782

JWiley commented 3 years ago

@paul-buerkner and @StaffanBetner I am quite interested in working on implementing this as my work uses mixed models with non-linear link functions regularly. Is this something I could help out with trying to develop in brms?

Incidentally, a paper has approached this using a Bayesian model: Kazemi, I., Hassanzadeh, F. Marginalized random-effects models for clustered binomial data through innovative link functions. AStA Adv Stat Anal 105, 197–228 (2021). https://doi.org/10.1007/s10182-021-00400-0

paul-buerkner commented 2 years ago

It seems to me that the marginal effects features requested here are actually implemented in the new brmsmargins package (https://github.com/JWiley/brmsmargins) developed by @JWiley. Joshua, can you confirm?

paul-buerkner commented 2 years ago

Closing this issue following the comment of @JWiley in #552.

JWiley commented 2 years ago

@paul-buerkner I believe that they are related but not the same. brmsmargins currently does this essentially:

calculate a marginal prediction on the response scale also sometimes called a population-averaged prediction, in the case of mixed effects models with some transformation / nonlinear link, this takes the form (for example from a logit model):

I solve that using numerical Monte Carlo integration, nothing fancy and analytical. I could probably do better(?) but I am better at coding than math.

for a continuous variable rely on numerical derivatives getting prediction at x and x + h for some small value of h.

What is suggested in the paper that @StaffanBetner suggested: Gory, J. J., Craigmile, P. F., & MacEachern, S. N. (2020). A class of generalized linear mixed models adjusted for marginal interpretability. Statistics in Medicine, sim.8782. https://doi.org/10.1002/sim.8782

is actually marginally interpretable coefficients. These are actually, I believe, still on the link scale. They are adjusted to represent the average coefficient value in the sample/population marginal to the random effects only, but not marginalizing over other coefficients.

I would still be interested in helping to implement this for brms. It would let users with multilevel structured data fit an appropriate conceptual model and present a table of regression results that are more generally applicable / interpretable than conventional GLMM fixed effect coefficients.

One place that does have it implemented is GLMMadaptive (https://drizopoulos.github.io/GLMMadaptive/reference/marginal_coefs.html) which implements in a frequentist model the approach suggested by Donald Hedeker: Hedeker, D., du Toit, S. H., Demirtas, H. and Gibbons, R. D. (2018), A note on marginalization of regression parameters from mixed models of binary outcomes. Biometrics 74, 354--361. doi:10.1111/biom.12707

For example, for a random intercept only probit model: population average / marginal B = B / sqrt(1 + var_int)

I've been struggling to wrap my head around the Hedeker paper --- I think its basic suggestion is for example in the case of a Bernoulli outcome with mixed effects to first fit the regular GLMM, then calculate marginal predicted probabilities for each observation (brmsmargins can do this), use the logit transformation on those, and then calculate population averaged coefficients as: X'X^-1X'logits --- basically, I think, the 'usual' linear regression way? Hedeker than goes on to sort out standard errors, but I'm assuming we could ignore that work as they'd come naturally (I hope?) from the Bayesian posterior draws.

If it is calculated after the model, it would not need to be done in brms itself and could be an add on package. If as @StaffanBetner suggested the calculation occurred in Stan itself than obviously would be better included in brms directly.

I'd be curious about thoughts / interest. I suspect if we can figure out a way to implement in Stan it could be more efficient. If you're not interested, I will slowly see about moving these population average coefficients forward in brmsmargins.

paul-buerkner commented 2 years ago

Thank you for your insights. I am not sure to be honest if this marginal coefficient approach could be easily implemented in Stan. The fact that Hedeker et all compute the coefficients via linear regression after the fact is problematic as it strongly limits the models this can be applied to in brms, because it supports so many special terms, that, ideally would be taken into account too, somehow. So there may be some generalization of their method necessary, but I don't know for certain right now.

I will unlikely move these kinds of marginalized regression coefficients forward in brms itself. I don't want to push you implemeting it in brmsmargins. After all, I don't fully know how widely relevant those marginalized coefficients would be. That said, if you happen to find in sensible and within the scope of brmsmargins, than I am sure users would appreciate this feature.

paul-buerkner / brms

Marginalized regression coefficients #996