rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

Why is `eff_size` output different than a paired samples cohen's d? #426

Closed walrossker closed 1 year ago

walrossker commented 1 year ago

Goal

Compute a Cohen's d effect size from a paired sample with some missing data (i.e., some people only have an observation at time 1 or time 2). I'd like the effect size to be based on a mixed model which does not remove people with incomplete data (the way that typical cohen's d functions in R do).

Issue

In example datasets with complete data, eff_size is returning significantly different effect sizes than other cohen's d functions in R. Maybe my implementation is wrong, or eff_size is returning something that's not a standard paired samples Cohen's d?

Reprex

library(tidyverse)

# Create dataset ---------------------------------------------------------------

set.seed(311)
n_people <- 30
outcome_t1 <- rnorm(n_people)
dat <- tibble(
  id = rep(1:n_people, times = 2) %>% factor(),
  time = rep(1:2, each = n_people) %>% factor(),
  outcome = c(outcome_t1, outcome_t1 + rnorm(outcome_t1, mean = 1, sd = .5))
)

# Calculate Cohen's d ----------------------------------------------------------

## Using paired sample method (within = FALSE) ----
## (Both methods yield the same result)
effectsize::cohens_d(x = dat$outcome[1:n_people],
                     y = dat$outcome[(n_people+1):(2*n_people)],
                     paired = TRUE)$Cohens_d 
#> [1] -2.533599
effsize::cohen.d(formula = outcome ~ time | Subject(id),
                 data = dat,
                 paired = TRUE,
                 within = FALSE)$estimate
#> [1] -2.533599

## Using paired sample method (within = TRUE) ----
effsize::cohen.d(formula = outcome ~ time | Subject(id),
                 data = dat,
                 paired = TRUE,
                 within = TRUE)$estimate
#> [1] -0.7336026

## Based on mixed model followed with emmeans::eff_size ----
mod_mixed <- lme4::lmer(outcome ~ time + (1|id), data = dat)
em_obj <-  emmeans::emmeans(mod_mixed, "time")
degrees_of_freedom <- summary(pairs(em_obj))$df
summary(emmeans::eff_size(em_obj, sigma = sigma(mod_mixed),
                          edf = degrees_of_freedom))$effect.size
#> [1] -3.58305
rvlenth commented 1 year ago

I am traveling so cannot look into this right away. But Inote you got two different results from effsize. So I wonder which result you were expecting to get from emmeans -- and why.

walrossker commented 1 year ago

I think I'm more interested in the second one (i.e., within = TRUE), but I'm not totally sure since I would've assumed that paired = TRUE already implies that I'm interested in within-person changes. Any guidance you can provide on that choice is much appreciated, but I just included both to make sure that the deviation from eff_size was not due to misspecifying that option.

rvlenth commented 1 year ago

OK, but don't get your hopes up that I can even answer the question. There are many situations where I am not sure that Cohen's d is even unambiguously defined. What if we have heterogenous variances? What if we have more than one source of variation (as I think is true in what we're discussing)?

And then there is the vast realm of knowledge that I simply lack. I don't work with effect sizes, and am easily confused by what is done in other R packages or in articles in various application fields where terminology shifts or is unfamiliar to me...

rvlenth commented 1 year ago

OK, here's a clue:

> eff_size(em_obj, sigma = sqrt(2)*sigma(mod_mixed), edf = degrees_of_freedom)
 contrast      effect.size    SE   df lower.CL upper.CL
 time1 - time2       -2.53 0.379 30.8    -3.31    -1.76

sigma used for effect sizes: 0.4124 
Degrees-of-freedom method: inherited from kenward-roger when re-gridding 
Confidence level used: 0.95 

This is the same estimate as

> effsize::cohen.d(formula = outcome ~ time | Subject(id),
+                  data = dat,
+                  paired = TRUE,
+                  within = FALSE)

Cohen's d

d estimate: -2.533599 (large)
95 percent confidence interval:
    lower     upper 
-3.283425 -1.783773

although the CI from emmeans is slightly wider.

The issue here is that with paired data, Cohen's $d$ is apparently relative to the SD of the paired differences, whereas sigma(mod_mixed) estimates the error SD, which is the SD of the individual errors after adjusting for the factor effects. The SD of the differences is sqrt(2) times that error SD.

I don't understand what effsize::cohen.d is doing with its within argument. It seems to be doing the opposite of what seems right to me. We have:

> lme4::VarCorr(mod_mixed)
 Groups   Name        Std.Dev.
 id       (Intercept) 1.14468 
 Residual             0.29162 

and these SDs are between-subjects and within-subjects, respectively. So if you say within = TRUE, that suggests we should use .29162 as the error SD -- which is what it actually did when we said within = FALSE. Moreover, when using paired differences, the between-subjects variance cancels out and so it makes no sense to use the between-subjects SD in any role. I suspect that you should never use paired = TRUE and within = TRUE together in effsize::cohen.d. You might want to ask the effsize developer about that.

As I've mentioned before, I am not sold on any kind of Cohen-d effect size as being appropriate in the context of a mixed model.

walrossker commented 1 year ago

Thanks for looking into this! Great to be able to replicate one of the outputs. The issues board doesn’t seem very active on the effsize repo, but will report back if I figure anything out about the within argument

rvlenth commented 1 year ago

I think this is resolved so am closing this issue. BTW, I added a note about paired data to the documentation for eff_size