Comparing each level to a weighted mean of all other levels

walrossker commented 4 months ago

In a one-way ANOVA, I want to compare each level's mean to the mean of all other samples ignoring group/level. My understanding is that the del.eff contrast method compares each level with the "average of averages" of all other levels. I thought the weights = "proportional" option might do what I'm aiming for, but it looks like a one-way ANOVA returns the same output regardless of weighting:

library(emmeans)

# Create data
dat <- mtcars
dat$cyl <- factor(dat$cyl)

# Run ANOVA
mod <- aov(mpg ~ cyl, data = dat)

# emmeans w/ equal weighting
emmeans(mod, del.eff ~ cyl, weights = "equal")
#> $emmeans
#>  cyl emmean    SE df lower.CL upper.CL
#>  4     26.7 0.972 29     24.7     28.7
#>  6     19.7 1.218 29     17.3     22.2
#>  8     15.1 0.861 29     13.3     16.9
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast    estimate   SE df t.ratio p.value
#>  cyl4 effect     9.24 1.23 29   7.544  <.0001
#>  cyl6 effect    -1.14 1.38 29  -0.825  0.4161
#>  cyl8 effect    -8.10 1.16 29  -6.976  <.0001
#> 
#> P value adjustment: fdr method for 3 tests

# emmeans w/ proportional weighting
emmeans(mod, del.eff ~ cyl, weights = "proportional")
#> $emmeans
#>  cyl emmean    SE df lower.CL upper.CL
#>  4     26.7 0.972 29     24.7     28.7
#>  6     19.7 1.218 29     17.3     22.2
#>  8     15.1 0.861 29     13.3     16.9
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast    estimate   SE df t.ratio p.value
#>  cyl4 effect     9.24 1.23 29   7.544  <.0001
#>  cyl6 effect    -1.14 1.38 29  -0.825  0.4161
#>  cyl8 effect    -8.10 1.16 29  -6.976  <.0001
#> 
#> P value adjustment: fdr method for 3 tests

# emmeans w/ cell weighting
emmeans(mod, del.eff ~ cyl, weights = "cell")
#> $emmeans
#>  cyl emmean    SE df lower.CL upper.CL
#>  4     26.7 0.972 29     24.7     28.7
#>  6     19.7 1.218 29     17.3     22.2
#>  8     15.1 0.861 29     13.3     16.9
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast    estimate   SE df t.ratio p.value
#>  cyl4 effect     9.24 1.23 29   7.544  <.0001
#>  cyl6 effect    -1.14 1.38 29  -0.825  0.4161
#>  cyl8 effect    -8.10 1.16 29  -6.976  <.0001
#> 
#> P value adjustment: fdr method for 3 tests

I can specify a custom contrast to get the tests I want (i.e., 10 is the mean difference for cyl == 4 that I was looking for, not the 9.24 above):

# Get the proportions for each level excluding focal level (i.e., cyl == 4)
props <- prop.table(table(dat[!dat$cyl == "4", "cyl"]))

# Run contrast comparing cyl == 4 mean to prop-weighted mean of other levels
cyl_4_contr <- c(1, 0, 0)
all_but_cyl_4_contr <- c(0, props["6"], props["8"])
emmeans(mod, "cyl", contr = list(cyl_4_contr - all_but_cyl_4_contr))
#> $emmeans
#>  cyl emmean    SE df lower.CL upper.CL
#>  4     26.7 0.972 29     24.7     28.7
#>  6     19.7 1.218 29     17.3     22.2
#>  8     15.1 0.861 29     13.3     16.9
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast                                                 estimate  SE df
#>  c(1, "6" = -0.333333333333333, "8" = -0.666666666666667)       10 1.2 29
#>  t.ratio p.value
#>    8.349  <.0001

...but this would require a wrapper function to iterate over all levels. So I'm curious if emmeans can already do this in another way or if you have any other recommendations. Would also like to be able to do this for the "eff" method (i.e., compare each level to the overall mean ignoring group/level). Thanks

rvlenth commented 4 months ago

This looks very similar to #346, so check that out.

I think your confusion is that weights is an argument for emmeans() to specify what kind of averaging to use when computing marginal means when averaging over other factors. Here, you have only the one factor. Moreover, weights is not an argument to contrast().

I also would add that I spend a lot of time regretting that I provided that two-sided formula interface, because it has you doing two different things: means and contrasts. That creates confusion because it's not always obvious which of those tasts is affected by optional arguments that you provide. I strongly recommend keeping means and contrasts separate.

Here's an approach to your example for obtaining eff-style contrasts against the weighted grand mean. First, get the means and the weighted grand mean:

> emm = emmeans(mod, ~ cyl)
> gm = emmeans(emm, ~ 1, weights = "prop")

Then combine those into one object:

> ( tmp = rbind(emm, gm) )
 cyl 1       emmean    SE df lower.CL upper.CL
 4   .         26.7 0.972 29     24.1     29.3
 6   .         19.7 1.218 29     16.5     23.0
 8   .         15.1 0.861 29     12.8     17.4
 .   overall   20.1 0.570 29     18.6     21.6

Results are averaged over some or all of the levels of: cyl 
Confidence level used: 0.95 
Conf-level adjustment: bonferroni method for 4 estimates

Finally, contrast each of these with the last one:

> contrast(tmp, "trt.vs.ctrlk")
 contrast        estimate    SE df t.ratio p.value
 4 . - . overall    6.573 0.787 29   8.349  <.0001
 6 . - . overall   -0.348 1.077 29  -0.323  0.9576
 8 . - . overall   -4.991 0.646 29  -7.725  <.0001
Results are averaged over some or all of the levels of: cyl 
P value adjustment: dunnettx method for 3 tests

It's harder doing the del.eff ones, but see that other issue.

Note: Making gm is an instance where weights makes a difference. Without weights, we get

> emmeans(emm, ~1)
 1       emmean    SE df lower.CL upper.CL
 overall   20.5 0.594 29     19.3     21.7

Results are averaged over the levels of: cyl 
Confidence level used: 0.95

The weighted grand mean is 20.1

rvlenth commented 4 months ago

Hmmm, just this same day, I ran across the contrMat() function in multcomp package. I think del.eff corresponds to type = "AVE" and eff corresponds to type = "GrandMean", though I don't readily see where this is documented.

In any case, you need to transpose the matrix this function produces. Example:

> library(multcomp)
> wts = emm@grid$.wgt.
> mth = data.frame(t(contrMat(wts, "AVE")))
> mth
         C.1   C.2        C.3
4  1.0000000 -0.44 -0.6111111
6 -0.3333333  1.00 -0.3888889
8 -0.6666667 -0.56  1.0000000

> contrast(emm, mth)
 contrast estimate   SE df t.ratio p.value
 C.1        10.016 1.20 29   8.349  <.0001
 C.2        -0.445 1.38 29  -0.323  0.7490
 C.3        -8.872 1.15 29  -7.725  <.0001

Using data.frame(t(X)) transposes the matrix and creates a data frame, which is really a list of its columns, so it works with the method argument of contrast().

rvlenth commented 4 months ago

Well, if I don't feel like a fool. It would help both you and me to read the documentation. There is already an optional wts argument in del.eff.emmc and eff.emmc. (I think I added that feature in response to #346, actually.) All you have to do is use 'em!

> w = emm@grid$.wgt.
> contrast(emm, "del.eff", wts = w)
 contrast    estimate   SE df t.ratio p.value
 cyl4 effect   10.016 1.20 29   8.349  <.0001
 cyl6 effect   -0.445 1.38 29  -0.323  0.7490
 cyl8 effect   -8.872 1.15 29  -7.725  <.0001

P value adjustment: fdr method for 3 tests 

> contrast(emm, "eff", wts = w)
 contrast    estimate    SE df t.ratio p.value
 cyl4 effect    6.573 0.787 29   8.349  <.0001
 cyl6 effect   -0.348 1.077 29  -0.323  0.7490
 cyl8 effect   -4.991 0.646 29  -7.725  <.0001

P value adjustment: fdr method for 3 tests

In the next update, I am adding a weights() method so we can use weights(emm) instead of what I did above for w.

rvlenth commented 4 months ago

I think this is resolved, so am closing this issue

walrossker commented 4 months ago

@rvlenth Thanks for looking into this! That’s already a good solution, and will keep an eye out for the update

rvlenth / emmeans

Comparing each level to a weighted mean of all other levels #469