Closed alesvomacka closed 1 year ago
You need to specify weights (counts/N) with the wts argument.
@vincentarelbundock thank you! Using weights worked. For future reference, here is the code:
n_obs <- nrow(penguins[!is.na(penguins$sex), ]) #total number of observation in the model
penguins_agg <- penguins |>
filter(!is.na(sex)) |>
mutate(body_mass_g = as.factor(body_mass_g)) |>
count(sex, body_mass_g, name = "success", .drop = FALSE) |>
mutate(body_mass_g = as.numeric(as.character(body_mass_g))) |>
mutate(total = sum(success),
fail = total - success,
.by = body_mass_g)
m_binom <- glm(cbind(success, fail) ~ sex * body_mass_g, data = penguins_agg, family = binomial())
w <- penguins_agg |>
mutate(weight = sum(success) / n_obs,
.by = body_mass_g) |>
pull()
avg_slopes(m_binom, variables = "body_mass_g", wts = w,
by = "sex")
# Term Contrast sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#body_mass_g mean(dY/dX) female -0.000253 2.37e-05 -10.7 <0.001 -0.000300 -0.000207
#body_mass_g mean(dY/dX) male 0.000253 2.36e-05 10.7 <0.001 0.000207 0.000300
Hi, I'm not sure if this is the right place to ask - if not, I'd appreciate being pointed in the right direction.
I'm trying to match results of logistic regression based on "expanded" data with binomial regression based on aggregated data. The idea is that working with aggregate data is much more computationally efficient, especially with big samples or bayesian models, while the results should be identical, because the underlying data generating process is the same.
The problem: While both models give the same predictions, their AMEs don't match.
Models
I've estimated two models using the Palmer penguins dataset. The first model is classical binary logistic regression predicting sex using body mass. The second is a binomial model predicting number of "success", i.e. penguin occurrence based on sex and body mass:
These two models give the same predicted values. However, I've hit a roadblock when trying to compute average marginal effects.
AMEs don't match
The binary logistic model gives correct AME for body mass:
However, AME for the second model is incorrect, because the individual level ME for males and females cancel each other out:
Group level AMEs match
I can get (almost exactly) matching results by computing AME for a specific sex:
Unfortunately, while the group level AMEs match across models, they don't match the "global" AME (AME = 0.000253 vs G-AME = 0.000242).
How to get correct "global" AME from the binomial model?
Is there a way to get correct "global" AME from the binomial model? I'm honestly not sure if the mismatch is due to me using this package incorrectly or if it's an unintuitive quirk of binomial models.
Thanks for help!
Technical info
R version 4.3.0 Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.4
Packages: