Closed dipetkov closed 1 year ago
Thanks a lot for the detailed bug report. I confirm that I see the same issue on my computer.
I'll look at this as soon as I find some time.
Notes to self:
dydx
does not appear to be flipped to dydxavgwts
when wts
is given. OK, I don't think this is a marginaleffects
problem after all. I'm not super familiar with brms
, but it looks like the brms::posterior_epred()
function we use to make predictions returns different results depending on the model object and newdata
structure.
Below, I illustrate this with simple predictions. I show how to get results using raw brms
commands and also with avg_predictions()
. As you will see, the results with the binomial or standard formulations are not equivalent, but the marginaleffects()
functions produce OK results. After those examples, I show you three equivalences with slopes.
I also close this issue because I am not convinced that this is a bug; it has to do with brms
. But feel free to keep the discussion going if you have more insights to bring to bear on the matter.
w <- data_agg$n / sum(data_agg$n)
# correct
avg_predictions(m_01s)
> Estimate 2.5 % 97.5 %
> 0.505 0.454 0.555
>
> Columns: estimate, conf.low, conf.high
> Type: response
posterior_epred(m_01s, newdata = data_01s) |>
apply(1, mean) |>
median()
> [1] 0.50465
# also correct, but the `brms` models do something different under the hood
avg_predictions(m_agg, newdata = data_agg, wts = w)
> Estimate 2.5 % 97.5 %
> 2.53 2.26 2.8
> Columns: estimate, conf.low, conf.high
> Type: response
posterior_epred(m_agg, newdata = data_agg) |>
apply(1, weighted.mean, w = w) |>
median()
> [1] 2.5333
# correct (no weights)
avg_predictions(m_agg, newdata = data_agg)
> Estimate 2.5 % 97.5 %
> 1.81 1.64 1.98
>
> Columns: estimate, conf.low, conf.high
> Type: response
posterior_epred(m_agg, newdata = data_agg) |>
apply(1, mean) |>
median()
> [1] 1.8121
Now three examples of different syntaxes that produce the same results with avg_comparisons()
:
avg_slopes(m_01s)
avg_slopes(m_agg, newdata = data_01s)
avg_slopes(m_01s, newdata = data_agg, wts = w)
I'm not re-opening this issue as there is no error in either brms
or marginaleffects
; just providing an explanation about what I think is going on.
With posterior_epred(m_agg)
, the posterior sample is a vector $[n_i\tilde{p}_i]$ where $n_i$ is the sample size and $\tilde{p}_i$ is the posterior probability. So the sample sizes are weights that have already been applied.
Here is how to reproduce the avg_prediction
with the aggregate data.
n <- data_agg$n
avg_predictions(m_agg, newdata = data_01s)
#> Estimate 2.5 % 97.5 %
#> 0.504 0.459 0.549
posterior_epred(m_agg, newdata = data_agg) |>
# The weights {n_i} have already been applied, so we only need to normalize by sum(n_i).
apply(1, \(x) sum(x) / sum(n)) |>
median()
#> [1] 0.5035349
I think this also means that there are no weights w
such that avg_slopes(m_agg, wts = w)
reproduce the marginal effect correctly. Instead, when we are working with the aggregate model, we can use one of these two methods:
# Model `m_agg` uses weights. In the original, un-collapsed dataset
# each observation is a subset of sample size of 1.
avg_slopes(m_agg, newdata = data_01s %>% mutate(n = 1))
#> Term Estimate 2.5 % 97.5 %
#> x 0.000255 0.000204 0.000297
# The sample sizes `n` get in the way of the `posterior_epred` calculation.
# So "turn those off" by setting `n = 1` and use the weights argument instead.
avg_slopes(m_agg, newdata = data_agg %>% mutate(n = 1), wts = data_agg$n)
#> Term Estimate 2.5 % 97.5 %
#> x 0.000255 0.000204 0.000297
Sorry about the false bug report & thank you for helping me to figure this out.
This is super useful. Thanks for coming around and clarifying. This was not a straightforward issue...
This is a follow-up to issue https://github.com/vincentarelbundock/marginaleffects/issues/813 about using the aggregate binomial model to get marginal estimates that match those obtained by fitting a model on the original (0s and 1s) data.
When the model is fitted with
glm
, a solution is to specify the weights:wts = n / sum(n)
. This solution doesn't seem to work for a binomial regression fitted withbrm
. (In that case the marginal slope can be reproduced withnewdata = original_data
.)Here is a reprex.