Which `edf` should be included for mixed models (and when `by`) is used to `eff_size`?

iago-pssjd commented 2 years ago

Hi!

I know the mixed model example in ?eff_size, but let me comment it later and start by other place. In the vignette https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html I see

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
pigs.emm.s <- emmeans(pigs.lm, "source")
pairs(pigs.emm.s)
##  contrast    estimate     SE df t.ratio p.value
##  fish - soy    -0.273 0.0529 23  -5.153  0.0001
##  fish - skim   -0.402 0.0542 23  -7.428  <.0001
##  soy - skim    -0.130 0.0530 23  -2.442  0.0570
## 
## Results are averaged over the levels of: percent 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 3 estimates

eff_size(pigs.emm.s, sigma = sigma(pigs.lm), edf = 23)

Therefore, I thought the edf value was got from the output of pairs(pigs.emm.s). Let's use a linear mixed model

library(lme4)
library(emmeans)
data(Orthodont,package="nlme")
Orthodont$nsex <- as.numeric(Orthodont$Sex=="Male")
Orthodont$nsexage <- with(Orthodont, nsex*age)
fit <- lmer(distance ~ age + (age|Subject) + Sex, data=Orthodont)
fit.s <- emmeans(fit, "Sex")
pairs(fit.s)
# gives df = 25

so I would try eff_size(fit.s, sigma = sigma(fit), edf = 25). But in the examples in ?eff_size I see that edf is approached by

df.residual(fit)
[1] 101

which is a very different output. Then I saw the mixed model example mentioned at the beginning:

### Mixed model example:
if (require(nlme)) withAutoprint({
  Oats.lme <- lme(yield ~ Variety + factor(nitro), 
                  random = ~ 1 | Block / Variety,
                  data = Oats)
  # Combine variance estimates
  VarCorr(Oats.lme)
  (totSD <- sqrt(214.4724 + 109.6931 + 162.5590))
  # I figure edf is somewhere between 5 (Blocks df) and 51 (Resid df)
  emmV <- emmeans(Oats.lme, ~ Variety)
  eff_size(emmV, sigma = totSD, edf = 5)
  eff_size(emmV, sigma = totSD, edf = 51)
}, spaced = TRUE)

Here, df.residual does not work for lme models. How did you get the values 5 and 51? 5 coincides with the df printed in emmV. Is this a good reference to cacth the df? In the examples above, would it has sense to use 25 or 101?

Let me ask you now about the by use. In your example

warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
emmeans (warp.lm,  ~ wool | tension)

all rows (all levels of tension and wool) have the same df. But what df to choose if different levels have different df?

Thank you!

rvlenth commented 2 years ago

How did you get the values 5 and 51?

That is explained in the comment line in the example:

# I figure edf is somewhere between 5 (Blocks df) and 51 (Resid df)

In the examples above, would it has sense to use 25 or 101?

Maybe. But actually it would make little difference. The value of `edf`` quantifies the uncertainty in the denominator of Cohen's d. It will not affect the estimate of d, just the width of the CI for it, and once you get above 25 or 30 d.f., that's already not that different from infinity.

But what df to choose if different levels have different df?

To err on the conservative side, I'd use the minimum of the d.f. for the pairwise differences (not for the means themselves).

iago-pssjd commented 2 years ago

Thanks for the answer!

Just to make clear, I've read the comment

# I figure edf is somewhere between 5 (Blocks df) and 51 (Resid df)

but I do not know where do you see the Blocks df and the Resid df in the output of the lme object. Well, the Blocks df coincides with the number under Block under Number of groups, is it its df? However I do not see the 51...

rvlenth / emmeans

Which `edf` should be included for mixed models (and when `by`) is used to `eff_size`? #379