rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

What is eff_size() actually computing for linear mixed models' pairwise comparisons? #398

Closed Larissa-Cury closed 1 year ago

Larissa-Cury commented 1 year ago

I'm wondering what is eff_size() actually computing when I use it to calculate effect sizes of my lmer:

mod1<- lmer(CONT_Y ~  MY_GROUP * YEAR + (1|ID), data = dfModels, REML = FALSE)

emm1 <- emmeans(mod1,~YEAR|MY_GROUP) ## these are my pairwise comparisons 
emm2 <- emmeans(mod1,~MY_GROUP|YEAR) ## these are my pairwise comparisons 

eff1 <- eff_size(emm1, sigma = sigma(mod1), edf = df.residual(mod1))   ## here
eff1 <- eff_size(emm2, sigma = sigma(mod1), edf = df.residual(mod1))   ## and here

The documentation says "Most popular is probably Cohen’s d, which is defined as the observed difference, divided by the population SD; and obviously Cohen effect sizes are close cousins of pairwise differences. They are available via the eff_size() function, where the user must specify the emmGrid object with the means to be compared". Ok, from this I get that it's Cohen's D. However, it's not mentioned on the function's doc

I've also seen elsewhere that, using the syntax above, it would be actually calculating partial eta. So this issue is for some clarification on this topic, I'd like to thank in advance. I'm wondering if it will always return Cohen's d or if, given a different syntax such as mine, it would calculate another estimate.

side question: why won't eff_size() work on the output of emmeans(mod1,~YEAR|MY_GROUP) %>% pairs(adjust="Tukey") ?

rvlenth commented 1 year ago

As is documented,

This function uses calls to regrid to put the estimated marginal means (EMMs) on the log scale. Then an extra element is added to this grid for the log of sigma and its standard error (where we assume that sigma is uncorrelated with the log EMMs). Then a call to contrast subtracts log{sigma} from each of the log EMMs, yielding values of log(EMM/sigma). Finally, the results are re-gridded back to the original scale and the desired contrasts are computed using method. In the log-scaling part, we actually rescale the absolute values and keep track of the signs.

Note that these results are pairwise comparisons of the form (EMM[i] - EMM[j]) / sigma -- that is, Cohen's d. See also this vignette section

Regarding the side question, he first example in the help page shows how to use it after a call to pairs():

fiber.lm <- lm(strength ~ diameter + machine, data = fiber)

emm <- emmeans(fiber.lm, "machine")
eff_size(emm, sigma = sigma(fiber.lm), edf = df.residual(fiber.lm))

# or equivalently:
eff_size(pairs(emm), sigma(fiber.lm), df.residual(fiber.lm), method = "identity")

Personally, I am not a fan of effect sizes, in part because they are ambiguously defined (or perhaps not defined at all) for anything other than a simple one-treatment model with no random effects. In fact, I saw a blog entry once by Lance Waller at Emory that suggests that's exactly what he recommends:

simp <- lm(CONT_Y ~  MY_GROUP * YEAR), data = dfModels)
eff_size(simp, sigma = sigma(simp), edf = df.residual(simpEMM, by = "MY_GROUP"))
eff_size(simp, sigma = sigma(simp), edf = df.residual(simpEMM, by = "YEAR"))

I'm quite sure you have both edf and sigma wrong if you insist on using the mixed model. The edf should be no larger than the smallest d.f. from pairs() using the K-R degrees of freedom. For sigma, you should probably use the total SD, which combines the error and subject SDs. Look at VarCorr(mod1) and then compute sigma = sqrt((error SD)^2 + (subject SD)^2)

Larissa-Cury commented 1 year ago

VarCorr(mod1) and then compute sigma = sqrt((error SD)^2 + (subject SD)^2)

Thank you very much for your kind reply and suggestions, it really helps me since this is my first model to be actually published. All right, I got the sigma from VarCorr(mod1) , but I'm not so sure I got "The edf should be no larger than the smallest d.f. from pairs() using the K-R degrees of freedom". This is my output:

> emmeans(mod1,~MY_GROUP|YEAR) %>% pairs(adjust="Tukey")
YEAR = 2020:
 contrast estimate   SE df t.ratio p.value
 G1 - G2    -0.915 0.53 60  -1.726  0.0894

YEAR = 2021:
 contrast estimate   SE df t.ratio p.value
 G1 - G2    -0.312 0.53 60  -0.590  0.5577

Degrees-of-freedom method: kenward-roger 
> emmeans(mod1,~YEAR|LMY_GROUP) %>% pairs(adjust="Tukey")
MY_GROUP = G1:
 contrast            estimate   SE df t.ratio p.value
 YEAR2020 - YEAR2021   -1.144 0.53 60  -2.159  0.0349

MY_GROUP = G2:
 contrast            estimate   SE df t.ratio p.value
 YEAR2020 - YEAR2021   -0.541 0.53 60  -1.022  0.3110

Degrees-of-freedom method: kenward-roger 

The df equals 60. Is there a way I can get the smallest df to pass it to edf, then? (I have to insist on the lmer because data comes from the same students, hence I need (1|Participant)

rvlenth commented 1 year ago

If all the df are 60, then use 60 for the edf.

Larissa-Cury commented 1 year ago

The edf should be no larger than the smallest d.f. from pairs() using the K-R degrees of freedom

I guess that now I got it! Thank you again! So by "The edf should be no larger than the smallest d.f. from pairs() using the K-R degrees of freedom" you meant that we use the smaller d.f. in the pairwise comparisons, right? Yes, in my case, all dfs are the same, but If I had a smaller one, then I should go with the smaller, right?

### the model is:

mod1 <- lmer(CONT_Y ~  MY_GROUP * YEAR + (1|ID), data = dfModels)

### estimate eemmeans:

group <- emmeans(mod1,~ MY_GROUP|YEAR)
year <- emmeans(mod1,~ YEAR|MY_GROUP)

### pairwise comparisons:

group_p <- emmeans(mod1,~ MY_GROUP|YEAR) %>% pairs(adjust="Tukey")
year_p <- emmeans(mod1,~ YEAR|MY_GROUP) %>% pairs(adjust="Tukey")

### correctiong sigma and edf

sigmaValues <- VarCorr(mod1)
sigmaValues

sigma <- sqrt((0.25743)^2 + (0.15054)^2)

### calculate Cohen's d: 

# eff1 <- eff_size(emm1, sigma = sigma(mod1), edf = df.residual(mod1)) ### before: 

group_p ### check the lowest df

eff1 <- eff_size(group, sigma = sigma, edf = 60)   ## and for group

year_p ### check the lowest df

eff2 <- eff_size(year, sigma = sigma, edf = 60)   ## and for year

Looking in the documentation, I didn't see how to adapt:

simp <- lm(CONT_Y ~  MY_GROUP * YEAR), data = dfModels)
eff_size(pairs(emm), sigma(fiber.lm), df.residual(fiber.lm), method = "identity")

to a lmer so tha'ts why I've performed all steps above. Is there also a straightforward path with pairs to a mixed model?

rvlenth commented 1 year ago

I think this is resolved, (or agree on how it isn't resolved), so am closing.