paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 183 forks source link

How to marginalize rather than condition on variables to make the output of brms marginal_effects literal AME, MER, and MEM #552

Closed realkrantz closed 2 years ago

realkrantz commented 5 years ago

I propose that the function marginal_effects be able to estimate the following marginal effects:

Quoting Leeper 2018, "MERs calculate the marginal effect of each variable at a particular combination of X values that is theoretically interesting. MEMs calculate the marginal effects of each variable at the means of the covariates. AMEs calculate marginal effects at every observed value of X and average across the resulting effect estimates.

AMEs are particularly useful because — unlike MEMs — produce a single quantity summary that reflects the full distribution of X rather than an arbitrary prediction. Together with MERs, AMEs have the potential to convey a considerable amount of information about the influence of each covariate on the outcome. Because AMEs average across the variability in the fitted outcomes, they can also capture variability better than MEMs.

[MERs operate] on a modified set of the full dataset wherein particular variables have been replaced by specified values. This is helpful because it allows for calculation of marginal effects for counterfactual datasets (e.g., what if all women were instead men? what if all democracies were instead autocracies? what if all foreign cars were instead domestic?).

While MERs provide a means to understand and communicate model estimates at theoretically important combinations of covariate values, AMEs provide a natural summary measure that respects both the distribution of the original data..."

For the case of logistic regression with a binary predictor, AMEs measure the discrete change, i.e. how do predicted probabilities change as the binary independent variable changes from 0 to 1, averaged over the actually observed cases.

For a continuous predictor, calculus can be used to compute the AME, but Cameron and Trivedi 2010 show that they can also be computed manually as follows:

  1. Compute the predicted values using the observed values of the variables. We will call this prediction1.
  2. Change one of the continuous independent variables by a very small amount. Cameron and Trivedi suggest using the standard deviation of the variable divided by 1000. We will refer to this as delta.
  3. Compute the new predicted values for each case. Call this prediction2.
  4. AME can be obtained by xme=(predict2-predict1)/delta
  5. Compute the mean value of xme. This is the AME for the variable in question

The current issue is related to the issue https://github.com/paul-buerkner/brms/issues/489 and a discussion on https://discourse.mc-stan.org.

Any idea when we will have this feature in brms?

paul-buerkner commented 5 years ago

Thanks for opening this issue. If I understand the terms correctly, MEMs is what marginal effects does by default, while MERs can be achieved via the conditions argument. It's indeed AMEs which are missing.

As you will understand, maintaining brms is nothing I get paid for and it is still a second full-time job to me anyway. So I can only tell you it's done when it's done.

realkrantz commented 5 years ago

Thanks a lot for this @paul-buerkner. It is understandable. Just one more question. The function for MERs in the above packages [1,2] predicts and plots differences in probabilities (AMEs) across the covariate we are conditioning on (one line per binary treatment), but in brms we get two lines, one for each category of the treatment (resulting in two lines per binary treatment). Any tip how to get the MERs that reflect differences in probabilities (AMEs) when the binary treatment changes from 0 to 1, across the covariate we are conditioning on (one line per binary treatment)?

paul-buerkner commented 5 years ago

Do it manually using the fitted(..., summary = FALSE) function with new data, which underlies marginal_effects. I trust you will figure this out.

realkrantz commented 5 years ago

Thanks. Would simply calculating manually the differences in the probabilitie from the data extracted from the brms marginal_effects object be reasonable?

These differences in probabilities between the levels of a binary treatment across the values of the conditioning covariate while marginalizing over all the other covariates are the quantities of interest, but I am not certain how to derive the respective confidence limits.

Under frequentist inference, the delta method is usually used to compute the standard error (SE) for marginal effects. Not certain how it could be done to get the SE in brms. Any input here would be much appreciated.

mjskay commented 5 years ago

@paul-buerkner Average marginal effects are something I've been contemplating adding more explicit support for in tidybayes, but was holding off on to see if you did something here :). If this is not a high priority for you, I might forge ahead there --- a potential advantage of having this in tidybayes is that the same code should hopefully work on both rstanarm and brms.

@realkrantz Doing AMEs for categorical variables is fairly straightforward with tidybayes without additional modification, I think. Assuming I understand them correctly, I put together an example here. Doing AMEs for continuous variables looks like it could be more of a pain...

paul-buerkner commented 5 years ago

Thanks, @mjskay! It's not a high priority for me not because it's not important, but because I need to figure out many subtleties in the implementation and I currently don't have the time to do it. My concern is that it's not immediately clear to me how to marginalize over more than one (up to all other) predictors without adding a huge computational burden, in particular, if some of them are continues. But I haven't looked into the details too much. So maybe there is an elegant way to do it.

realkrantz commented 5 years ago

Thanks a lot @mjskay @paul-buerkner. This is a tremendous and very important work. tidybayes (https://github.com/mjskay/tidybayes) is truly powerful and the way it interacts with brms and rstanarm makes everything very convenient and smooth.

I would like to compute MERs and AMEs for categorical variables. These are a very informative means for summarizing how change in a response is related to change in a covariate. So this solution you just provided is of tremendous practical value. Further considerations below based on the formulas (3) and (4) from here.

For MER, is it possible to calculate directly the quantities B_diff_a1 = E[y|A=a1, B=b2] - E[y|A=a1, B=b1] and B_diff_a2 = E[y|A=a2, B=b2] - E[y|A=a2, B=b1] with their confidence limits? These differences in marginal mean conditional on particular values of our predictors would be our MERs of changing B from b1 to b2 at A=a1 and A=a2, respectively. For this case, our treatment is B and the representive values of our covariate A are a1 and a2.

For AME, is it possible to estimate directly A_diff = E[y|A=a2] - E[y|A=a1] with its confidence limits? This quantity would give us the difference between the average effect for each value of A, marginalizing over B. This would be our AME of changing A from a1 to a2. For this case, the treatment is A, the levels of the treatment are a1 and a2, and the covariate we are marginalizing over is B.

Thank you very much for this.

mjskay commented 5 years ago

For both types of mean differences, you should be able to use compare_levels. I updated the AME example I sent before to include an example of compare_levels.

More info on compare_levels is also in the tidybayes + brms vignette.

realkrantz commented 5 years ago

@mjskay This is great. This allows estimation of AMEs and MERs in a smooth manner. Thank you very much.

paul-buerkner commented 4 years ago

@jackobailey writes in #767:

For models other than linear regression, interpreting model coefficients is hard. This is often because they refer to abstract scales (logits, probits) rather than that of the outcome we're interested in (probabilities, counts). One way of overcoming this is to compute average marginal effects. Reporting these is near standard in some fields (economics, political science).

@leeper has implemented these for many model types in his margins package. He has also written a good accompanying paper that explains marginal effects and how to calculate them.

Margins does not work with brms. This is unfortunate, as brms offers unparalleled opportunities when it comes to estimating non-linear models. Thus, I propose that you implement computing average marginal effects in brms.

I have had to compute marginal effects for brms models for a project that I am working on. Here's my attempt at a first pass based on @leeper's code. As you can see, the code is relatively straightforward and, as it relies on the fitted function, suitably generic as to be extendable to most model types.

statsccpr commented 4 years ago

So I've been following the rabbit hole on 'marginal' effects of regression models related to brms. The way I see it, there are two paths that are related but nuanced.

I think the past solutions, directly address one path, when you want to average out a categorical quantity

  1. start with E{Y | A,B} and output E(Y | A) by 'averaging' over the categorical B using the literal weighted mean. The two examples are illustrations of this
  1. The other path, is when the variable averaged out is not categorical, and you need to average over a distribution. start with E{Y | A,B} and output E(Y | A) by 'averaging' over the continuous distribution of B using integration (whether numerical or monte carlo)

From what I can tell, the margins package address 1 but does not yet directly address 2. Meaning, if a user has a regression model with 'random effects' and desires output that integrates out the distribution of the random effect, you still need to roll it by hand as in the rpsychologist post, which is a worked out example of paul's simulation draws example.

Or as @jackobailey is suggesting below, rope in the margins package if it helps the intermediate computations https://github.com/paul-buerkner/brms/issues/489#issuecomment-541676662 But thinking this idea through, I don't think the margins package lets the user designate that the distribution of the random effect is the term being 'integrated out'. I'm guessing the margins package is still going to focus on the non-random effect terms

I think more people are going to start demanding this 2nd option Where people have non brms solutions https://github.com/drizopoulos/GLMMadaptive

paul-buerkner commented 4 years ago

I agree with your assessment. Actually, we have 3 situations here, as marginizing over continous variables is technically not the same as marginalizing over random effects. The former are "just" data while the later are parameters with corresponding posterior distributions which makes the marginalization computationally much more demanding.

I think we should address them in order, that is:

  1. marginalize over categorical variables (relatively easy)
  2. marginalize over continuous variables, perhaps by converting them to categorical variables with sufficient number of categories
  3. marginalize over random effects
mjskay commented 4 years ago

I think with the posterior package (and especially careful thinking around https://github.com/jgabry/posterior/issues/39) it would be nice to design an interface in posterior that can do these three types of marginalizations for any model that supports posterior's consistent interface for marginal means, posterior predictive distributions, and random effects.

This might require being a bit more prescriptive about arguments to functions like pp_expect and/or providing functions that can inspect a model (e.g. get info about predictors, random effects, etc).

leeper commented 4 years ago

I don't think I'll ever do (2) in margins as I'm narrowly focused on the use case in (1).

paul-buerkner commented 4 years ago

Andew posted a related paper on discourse, which could help a lot: https://discourse.mc-stan.org/t/request-for-average-predictive-comparisons-in-rstanarm-and-brms/12477

jackobailey commented 4 years ago

I have a suggestion that might work for factors, but not continuous variables. Still, it would make some good progress.

Rather than try to marginalise out factors post-hoc, could you do a lot of the work under the hood during the fitting process? I'm thinking of something like using cell mean coding or index coding by default when fitting the model, computing the grand mean of each factor which users could condition on, then returning contrast coded output in the summary regardless. Much like how you rescale the intercept under the hood at the moment.

It would be similar to the approach you discuss here.

paul-buerkner commented 4 years ago

Thanks, I will take a look!

paul-buerkner commented 2 years ago

It seems to me that the marginal effects features requested here are actually implemented in the new brmsmargins package (https://github.com/JWiley/brmsmargins) developed by @JWiley. Joshua, can you confirm?

JWiley commented 2 years ago

@paul-buerkner I believe this should be true for many cases. So long as the correct input data are specified, brmsmargins should calculate AMEs, MEMs, and MERs for many models supported by brms and I have hopes to add support for more in the future (anyone with specific interests, please open an issue on brmsmargins. I believe uniquely from other options, brmsmargins does support integrating over random effects for correct marginal estimates from random effect models (albeit at considerable computational burden).

paul-buerkner commented 2 years ago

Perfect! Thank you so much for working on this! Closing this issue now.