Closed Dr-Robert-Mason closed 1 year ago
postscript: have reproduced the same issue in the lsmeans package.
I think this is an example of expecting computers to read one's mind, as opposed to just doing literally what it is told. Please notice:
> head(warpbreaks)
breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
> head(trialdata)
breaks wool tension
1 26 B H
2 30 B H
3 54 B H
4 25 B H
5 70 B H
6 52 B H
Observe that the first six rows of each data set are the same, but with different labels for the factors. You asked to relabel the factors, not to reorder them; and that's what it did.
To reinforce this, note that the coefficients of the two models are identical, but with different names
> summary(warp.lm1)$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.55556 3.646761 12.217842 2.425903e-16
woolB -16.33333 5.157299 -3.167032 2.676803e-03
tensionM -20.55556 5.157299 -3.985721 2.280796e-04
tensionH -20.00000 5.157299 -3.877999 3.199282e-04
woolB:tensionM 21.11111 7.293523 2.894501 5.698287e-03
woolB:tensionH 10.55556 7.293523 1.447251 1.543266e-01
> summary(warp.lm2)$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.55556 3.646761 12.217842 2.425903e-16
woolA -16.33333 5.157299 -3.167032 2.676803e-03
tensionL -20.55556 5.157299 -3.985721 2.280796e-04
tensionM -20.00000 5.157299 -3.877999 3.199282e-04
woolA:tensionL 21.11111 7.293523 2.894501 5.698287e-03
woolA:tensionM 10.55556 7.293523 1.447251 1.543266e-01
So really, nothing has changed, and emmeans()
just did what it is supposed to do, but with different labels. There is no bug here.
To re-order the levels of factors, you need to call factor()
so as to create a new factor with the altered levels:
> foo = warpbreaks
> foo = transform(warpbreaks,
+ wool = factor(wool, levels = c("B", "A")),
+ tension = factor(tension, levels = c("H", "L", "M")))
> head(foo)
breaks wool tension
1 26 A L
2 30 A L
3 54 A L
4 25 A L
5 70 A L
6 52 A L
> ref_grid(lm(breaks ~ wool*tension, data = foo))
'emmGrid' object with variables:
wool = B, A
tension = H, L, M
Regarding lsmeans, that package started with a very different internal architecture than emmeans, but is now just a front end for emmeans, and exists just so old textbook examples will still work. Even the lsmeans()
function itself is in emmeans. So tests in the lsmeans package will come out the same as in the emmeans package.
I believe this issue is resolved, so I am closing it. That does not prevent further comments.
PS -- You could also re-order the levels using reorder()
or relevel()
. See their help pages.
Thanks for the response and the explanation of what actually caused the apparent issue, that is helpful and much appreciated.
After using levels() to reorder the levels of a factor in a workflow that later involves the application of emmeans, the numerical output thereof is erroneously labelled. Specifically, though the order of the factor levels in emmeans output corresponds to the order that was specified using levels(), the numerical outputs correspond to the default ordering of factor levels in the dataset that would have resulted had levels() not been used earlier in the workflow.
For a reproducible example, run the following block of code:
warp.lm1 <- lm(breaks ~ wool tension, data = warpbreaks) w.t1 <- pairs(emmeans(warp.lm1, ~ wool | tension)) t.w1 <- pairs(emmeans(warp.lm1, ~ tension | wool)) combined1<-rbind(w.t1, t.w1) trialdata<-warpbreaks levels(trialdata$wool)<-c("B","A") levels(trialdata$tension)<-c("H","L","M") warp.lm2 <- lm(breaks ~ wool tension, data = trialdata) w.t2 <- pairs(emmeans(warp.lm2, ~ wool | tension)) t.w2 <- pairs(emmeans(warp.lm2, ~ tension | wool)) combined2<-rbind(w.t2, t.w2)
Compare tw.1 to tw.2. The rows of numerical output in the "estimate", "SE", "df", "t.ratio", and "p.value" columns are ordered identically in both outputs but the order of the "wool" groups and the order of the "tension" contrasts within each "wool" group is different in tw.1 and tw.2.
Compare combined1 and combined2 to see another illustration of this issue.
(Example dataset and variable names inspired by the post at: https://stats.stackexchange.com/questions/165125/lsmeans-r-adjust-for-multiple-comparisons-with-interaction-terms)