rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

Is there a way to incorporate sampling weights when using emmeans #443

Closed lydiazhangnyspi closed 1 year ago

lydiazhangnyspi commented 1 year ago

I am trying to use emmeans to get the adjusted marginal means from a model with sampling weights. From my understanding the weights option in the emmeans function uses either equal weight or else use the cell sizes but I cannot get it to use sample weight adjusted cell sizes.

rvlenth commented 1 year ago

It only uses cell sizes if the model does not contain weights.

You should first put your weights in the model itself in the appropriate way. Then the predictions on the reference grid will be weighted correctly. emmeans will have the weighting information it needs in case you want to compute non-equally-weighted marginal means from the means in the reference grid.

Example

We fit a model with prior weights on the observations...

> set.seed(9.18)
> wts <- rpois(29, 12)
> mod <- lm(inverse(conc) ~ source + factor(percent), data = pigs, weights = wts)

The weights are kept track of and you can see them if you look at the reference grid:

> ref_grid(mod)@grid
   source percent .wgt.
1    fish       9    18
2     soy       9    43
3    skim       9    44
4    fish      12    31
5     soy      12    47
6    skim      12    32
7    fish      15    22
8     soy      15    37
9    skim      15    27
10   fish      18    42
11    soy      18     9
12   skim      18    12

What you do with these weights is a separate matter. Here's what you get if you use proportional weights:

> emmeans(mod, "percent", weights = "prop")
 percent emmean       SE df lower.CL upper.CL
       9 0.0322 0.001036 23   0.0301   0.0344
      12 0.0271 0.000998 23   0.0250   0.0291
      15 0.0261 0.001129 23   0.0237   0.0284
      18 0.0240 0.001393 23   0.0211   0.0269

Results are averaged over the levels of: source 
Results are given on the inverse (not the response) scale. 
Confidence level used: 0.95 

These means were weighted proportionally, i.e. the weight given to fish is 18 + 31 + 22 + 42, versus 43 + 47 + 37 + 9 for soy and 44 + 32 + 27 + 12 for skim (these same 3 weights are used for each of the four marginal means shown).

On the other hand if you used weights = "cells", it would use a different set of 3 weights for each mean. That will result in the weighted marginal means from the data set itself, and those means will be confounded with source effects whereas the proportionally weighted ones (or equally weighted ones) will not.

rvlenth commented 1 year ago

PS -- Make sure you update emmeans. A couple of versions ago, there was a glitch that prevented weights from being tracked in certain circumstances.

rvlenth commented 1 year ago

Closing this issue as I think it is completed.