rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

emmeans gives wrong results for `afex_aov`s if the `at = ` argument is reversed #430

Closed Joao-O-Santos closed 1 year ago

Joao-O-Santos commented 1 year ago

Describe the bug

When I run emmeans(model, ~ Var, at = list(Var = c("levelx", "levely"))) the results get switched for afex_aovs if I place levelx and levely in a different order, but this doesn't happen for models of class afex_mixed, and it probably won't happen for other models.

To reproduce

library(emmeans)

df <- data.frame(a = rnorm(300), b = rnorm(300, 3), pp = 1:300)
df <- tidyr::pivot_longer(df, -"pp")
afex_aov <- afex::aov_4(value ~ name + (name | pp), df)
afex_mixed <- afex::mixed(value ~ name + (1 | pp), df)

ordered <- list(name = c("a", "b"))
reversed <- list(name = c("b", "a")

Expected behavior

If we list the factors in the order they appear we always get correct results.

emmeans(afex_aov, ~ name, at = ordered)
emmeans(afex_mixed, ~ name, at = ordered)

If we reverse the names of the factors we get wrong values in for the model of class afex_aov.

emmeans(afex_aov, ~ name, at = reversed)
emmeans(afex_mixed, ~ name, at = reversed)

How can I help?

Thank you so much for developing emmeans it is such an amazing package! I noticed in the README that you are looking for another maintainer, I'm afraid I can't take that burden right now, but let me know if there's any way I can help, with this bug, or with other smaller tasks.

Sincery, J

P.S: I checked the ground rules.

Ground rules

Make sure that you have used things as they are documented.

It might be user error on my part, but I found no documentation about having to list the labels in the correct order for afex_aovs but not for other models.

More than one or two pipes is usually too many.

Check. I used no pipes.

Did you know that there is an index of vignette topics?

I looked up the vignettes and found no mention of this, but I might have missed something.

Please examine the output from what you have tried.

There are no annotations below the output.

I also looked at the afex package to see if it was implementing this functionality. I couldn't tell for sure if that was the case, so I submitted this report and cross-posted it in the afex package as well.

rvlenth commented 1 year ago

Hmmmm. The issue isn't really in afex per se, it is in how multivariate models are handled. Specifically, the at argument comes into play twice: once when we select which predictor levels to use, and again when we select which multivariate levels to keep. I'll do a parallel example using one of emmeans's provided datasets:

library(emmeans)
MOats.lm <- lm (yield ~ Block + Variety, data = MOats)

# All levels of Variety
emmeans(MOats.lm, "Variety")
##  Variety     emmean   SE df lower.CL upper.CL
##  Golden Rain  104.5 5.01 10     93.3      116
##  Marvellous   109.8 5.01 10     98.6      121
##  Victory       97.6 5.01 10     86.5      109
## 
## Results are averaged over the levels of: Block, rep.meas 
## Confidence level used: 0.95

# Two levels in order
emmeans(MOats.lm, "Variety", at = list(Variety = c("Golden Rain", "Victory")))
##  Variety     emmean   SE df lower.CL upper.CL
##  Golden Rain  104.5 5.01 10     93.3      116
##  Victory       97.6 5.01 10     86.5      109
## 
## Results are averaged over the levels of: Block, rep.meas 
## Confidence level used: 0.95

# Two levels in reverse order
emmeans(MOats.lm, "Variety", at = list(Variety = rev(c("Golden Rain", "Victory"))))
##  Variety     emmean   SE df lower.CL upper.CL
##  Victory       97.6 5.01 10     86.5      109
##  Golden Rain  104.5 5.01 10     93.3      116
## 
## Results are averaged over the levels of: Block, rep.meas 
## Confidence level used: 0.95

########################################################
# All levels of rep.meas
emmeans(MOats.lm, "rep.meas")
##  rep.meas emmean   SE df lower.CL upper.CL
##  0          79.4 3.20 10     72.3     86.5
##  0.2        98.9 3.81 10     90.4    107.4
##  0.4       114.2 5.02 10    103.0    125.4
##  0.6       123.4 4.22 10    114.0    132.8
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

# Three levels in order
emmeans(MOats.lm, "rep.meas", at = list(rep.meas = c(0,.2,.6)))
##  rep.meas emmean   SE df lower.CL upper.CL
##       0.0   79.4 3.20 10     72.3     86.5
##       0.2   98.9 3.81 10     90.4    107.4
##       0.6  123.4 4.22 10    114.0    132.8
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

# Three levels in reverse order
emmeans(MOats.lm, "rep.meas", at = list(rep.meas = rev(c(0,.2,.6))))
##  rep.meas emmean   SE df lower.CL upper.CL
##       0.6   79.4 3.20 10     72.3     86.5
##       0.2   98.9 3.81 10     90.4    107.4
##       0.0  123.4 4.22 10    114.0    132.8
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

Created on 2023-07-13 with reprex v2.0.2

What we see is that at is working right for Variety, which is a predictor, but not for rep.meas, which is a multivariate ressponse. I'll try to fix this.

Thanks for reporting this. I'll post again when I have fixed it.

rvlenth commented 1 year ago

The code I had worked by excluding the levels not used, but not re-ordering the rest of it. I re-coded this section so it picks out the cases that match the specified levels. It seems to work right now:

library(emmeans)
MOats.lm <- lm (yield ~ Block + Variety, data = MOats)

# All levels of rep.meas
emmeans(MOats.lm, "rep.meas")
##  rep.meas emmean   SE df lower.CL upper.CL
##  0          79.4 3.20 10     72.3     86.5
##  0.2        98.9 3.81 10     90.4    107.4
##  0.4       114.2 5.02 10    103.0    125.4
##  0.6       123.4 4.22 10    114.0    132.8
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

# Three levels in order
emmeans(MOats.lm, "rep.meas", at = list(rep.meas = c(0,.2,.6)))
##  rep.meas emmean   SE df lower.CL upper.CL
##       0.0   79.4 3.20 10     72.3     86.5
##       0.2   98.9 3.81 10     90.4    107.4
##       0.6  123.4 4.22 10    114.0    132.8
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

# Three levels in reverse order
emmeans(MOats.lm, "rep.meas", at = list(rep.meas = rev(c(0,.2,.6))))
##  rep.meas emmean   SE df lower.CL upper.CL
##       0.6  123.4 4.22 10    114.0    132.8
##       0.2   98.9 3.81 10     90.4    107.4
##       0.0   79.4 3.20 10     72.3     86.5
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

# All levels plus an illegal one, haphazard order
emmeans(MOats.lm, "rep.meas", at = list(rep.meas = c(.2,.6,.4,.75,0)))
##  rep.meas emmean   SE df lower.CL upper.CL
##       0.2   98.9 3.81 10     90.4    107.4
##       0.6  123.4 4.22 10    114.0    132.8
##       0.4  114.2 5.02 10    103.0    125.4
##       0.0   79.4 3.20 10     72.3     86.5
## 
## Results are averaged over the levels of: Block, Variety 
## Confidence level used: 0.95

Created on 2023-07-13 with reprex v2.0.2

I'll push this up soon.

Joao-O-Santos commented 1 year ago

Thank you, so, so much for being on top of this and for fixing it so fast, Prof. Lenth!

P.S: feel free to close the issue.

rvlenth commented 1 year ago

Closing this issue as resolved