Closed andymilne closed 1 year ago
My guess is that this has to do with the order of operations. Note that we take the mean within each draw, but then report the median of the posterior. Here's a couple simple equivalences for you to play with:
library(brms)
library(dplyr)
library(marginaleffects)
mod <- brm(mpg ~ hp + mo(cyl), data = mtcars)
# simplest case
predictions(mod)$predicted
pre |>
posteriordraws() |>
group_by(rowid) |>
summarize(predicted = median(draw)) |>
pull(predicted)
# `by` case
predictions(mod, by = "cyl")$predicted
mod |>
predictions() |>
posteriordraws() |>
group_by(drawid, cyl) |>
summarize(predicted = mean(draw)) |>
group_by(cyl) |>
summarize(predicted = median(predicted)) |>
pull(predicted)
Ah yes, that seems to explain it. Thank you. Is there any way to force the summary() function to output the mean of the predictions/point estimates rather than their median? It seems unintuitive to me, at least, to have the mean applied to the draws but the median applied to the "predictions" (the means of the draws).
To me, the logic is:
by
always gives you the mean of something, unless you specify the byfun
argument.So the defaults make sense to me. But of course, I understand that other people may have different needs, which is precisely why there is a posteriordraws()
function. You can do stuff like:
pred <- predictions(mod)
draw <- posteriordraws(pred, "PxD")
rowMeans(collapse::fmean(draw, g = mtcars$cyl))
I didn't know (1) byfun
– that's useful. I don't agree with (2); for example, to my understanding, brms usually reports the mean of the posterior unless the robust = TRUE
argument is used. Can I request an option be made available for either mean or median? – part of the reason I like to use marginaleffects is precisely to avoid having to deal with the potential errors I might (will) make by manually coding things like rowMeans(collapse::fmean(draw, g = mtcars$cyl))
.
Yeah, that sounds like a useful feature request. Thanks for pushing a bit.
I added a global option to change the methods to summarize posteriors: center and intervals. It is now documented in a separate section in all the function documentations, and also in the brms
vignette. Using the Github dev version you can now do:
library(brms)
library(dplyr)
library(marginaleffects)
options(marginaleffects_posterior_center = mean)
dat <- mutate(mtcars, cyl = factor(cyl))
mod <- brm(mpg ~ hp + cyl, data = dat)
predictions(mod, newdata = datagridcf(cyl = unique), by = "cyl") |>
tidy() |>
pull(estimate)
[1] 25.09969 19.12088 16.56856
predictions(mod, newdata = datagridcf(cyl = unique)) |>
posteriordraws() |>
group_by(cyl) |>
summarize(predicted = mean(draw)) |>
pull(predicted)
[1] 25.09969 19.12088 16.56856
FYI, the documentation website is now updated. Two relevant links:
This is excellent – many thanks for implementing this and it's great to have the ETI/HDI option for the credible intervals too.
Apologies if this is part question partly a possible bug report. This link https://vincentarelbundock.github.io/marginaleffects/articles/predictions.html#average-adjusted-predictions-by-group implies that the two methods shown (using the by argument or using dplyr's summarise) should produce the same means. In my example, it doesn't and I don't understand why. If you need replicable code, I can provide but maybe the example below is sufficient -- the model is a brms model with the beta_binomial family: