rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

A question about the linear functions values in a balanced 2-way ANOVA with interaction - joint_tests() #515

Open adrianolszewski opened 1 month ago

adrianolszewski commented 1 month ago

Dear Professor,

This is not a bug report, I'm just trying to understand what's going on under the hood. Would not you mind if I asked you to guide me a little?

The data: It's a toy dataset with 2x2 Sex and Treatment design. It's balanced (no class is missing) and with equal number of observations in each cell.

set.seed(1000)
rbind(
  data.frame(Sex="Male",   Treatment="A", Response = rnorm(10, mean=1)),
  data.frame(Sex="Male",   Treatment="B", Response = rnorm(10, mean=2)),
  data.frame(Sex="Female", Treatment="A", Response = rnorm(10, mean=2)),
  data.frame(Sex="Female", Treatment="B", Response = rnorm(10, mean=2))
) %>% 
  mutate(Sex        = factor(Sex),
         Treatment  = factor(Treatment)) -> balanced

Let's fit a simple linear model to reproduce ANOVA

options(contrasts=c("contr.treatment", "contr.poly"))
m <- lm(Response ~ Sex * Treatment , data=balanced)

Let's run ANOVA. Type-1 = Type-3 in this scenario, so either approach will work. This is just a toy example, let's focus on the numbers itself, not meaning.

> summary(aov(m))
              Df Sum Sq Mean Sq F value Pr(>F)   
Sex            1  7.299   7.299   8.455 0.0062 **
Treatment      1  0.118   0.118   0.137 0.7132   
Sex:Treatment  1  6.157   6.157   7.132 0.0113 * 
Residuals     36 31.077   0.863

Now let's reproduce it with joint_tests (for balanced case LS-means = raw means, so it should same results)

> (jt <- joint_tests(m))
 model term    df1 df2 F.ratio p.value
 Sex             1  36   8.455  0.0062
 Treatment       1  36   0.137  0.7132
 Sex:Treatment   1  36   7.132  0.0113

Perfect agreement.

Now let's check the linear functions values:

> attributes(jt)$est.fcns
$Sex
     (Intercept)    SexMale TreatmentB SexMale:TreatmentB
[1,]           0 -0.8944272          0         -0.4472136

$Treatment
     (Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,]           0       0 -0.8944272         -0.4472136

$`Sex:Treatment`
     (Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,]           0       0          0                 -1

The coefficients for Sex and Treatment look close to -1 and -0.5**.

Let's reproduce the main effects step by step naively, just by creating appropriate contrasts over the LS-means.

> (em <- emmeans(lm(Response ~ Sex * Treatment , data=balanced), ~ Sex*Treatment))

  Sex    Treatment emmean    SE df lower.CL upper.CL
 Female A          2.310 0.294 36    1.714     2.91
 Male   A          0.671 0.294 36    0.075     1.27
 Female B          1.634 0.294 36    1.038     2.23
 Male   B          1.564 0.294 36    0.969     2.16

and the orthogonal contrasts:

#                                     FemA  MalA  FemB  MalB
contrast(em, list("Sex"           = c(-0.5,  0.5, -0.5,  0.5), 
                  "Treatment"     = c(-0.5, -0.5,  0.5,  0.5), 
                  "Sex:Treatment" = c(-0.5,  0.5,  0.5, -0.5))) %>%
    data.frame %>%
    mutate(F = t.ratio^2, .after="t.ratio") %>%
    mutate(across(where(is.numeric), ~round(., 4)))

       contrast estimate     SE df t.ratio      F p.value
1           Sex  -0.8543 0.2938 36 -2.9078 8.4552  0.0062
2     Treatment   0.1088 0.2938 36  0.3704 0.1372  0.7132
3 Sex:Treatment  -0.7847 0.2938 36 -2.6706 7.1324  0.0113

OK, both F statistic and p-values are reproduced

Now let's switch to raw model coefficients. I'm going to reproduce the results with multcomp::glht()

# First I'm defining the means based on the coefficients:
K <- matrix(c(1, 0, 0, 0, # Female A
               1, 0, 1, 0,  # Female B
               1, 1, 0, 0,  # Male A
               1, 1, 1, 1), # Male B           
             byrow = TRUE, ncol = 4,
             dimnames = list(c("Female A",
                               "Female B",
                               "Male A",
                               "Male B"),
                             c("Intercept", "SexMale", 
                               "TreatmentB", "SexMale:TreatmentB")))
> K
         Intercept SexMale TreatmentB SexMale:TreatmentB
Female A         1       0          0                  0
Female B         1       0          1                  0
Male A           1       1          0                  0
Male B           1       1          1                  1

Let's reproduce for Sex: = c(-0.5, 0.5, -0.5, 0.5)

library(multcomp)

  > (contr_sex <- matrix(c(-0.5*K[1,] + 0.5*K[3,] - 0.5*K[2,] + 0.5*K[4,]), byrow = T, nrow = 1))
     [,1] [,2] [,3] [,4]
[1,]    0    1    0  0.5 

> summary(glht(m, matrix(c( 0,  1,  0,  0.5), byrow = T, nrow = 1)))
     Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = Response ~ Sex * Treatment, data = balanced)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)   
1 == 0  -0.8543     0.2938  -2.908   0.0062 **
---

By the way, this could be obtained from the emmeans directly - it warns about the interaction involvement:

> (x <- contrast(emmeans(m, ~ Sex), "revpairwise"))
NOTE: Results may be misleading due to involvement in interactions
 contrast      estimate    SE df t.ratio p.value
 Male - Female   -0.854 0.294 36  -2.908  0.0062

Results are averaged over the levels of: Treatment 
> x@linfct
     (Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,]           0       1          0                0.5   # rather than 0.8944272 and 0.4472136

Anyway, for Sex it agrees with LS-means and joint_tests() -2.907783^2 = 8.455201 Same for Treatment and their interaction.


OK, now let's compare the coefficients obtained from joint_tests() with the contrasts using 0.5s coefficients: I will only focus on Sex to limit the length:

> summary(glht(m, matrix(c(0, 0.8944272, 0, 0.4472136), byrow = T, nrow = 1)))

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = Response ~ Sex * Treatment, data = balanced)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)   
1 == 0  -0.7641     0.2628  -2.908   0.0062 **

Well, the estimates differ, -0.8543 vs. -0.7641, the standard errors differ too: 0.2938 vs. 0.2628. But t.values agree: -2.907783 in both.

So this agreed closely with was was created by aov() because the coefficients were close enough to 0.5 and 1.

rvlenth commented 1 month ago

I'm traveling. This will have to wait until I get home

adrianolszewski commented 1 month ago

Thank you very much, Professor. I simplified the code above, so when you return please ignore the all intermediate updates.

I understand that the values of the coefficients are not "nicely rounded" because of the interaction and type-3 analysis. I'm studying the joint_tests() function body, but a little support on how these values are obtained will be greatly appreciated.

I guess my answer is there: https://support.sas.com/documentation/onlinedoc/stat/141/introglmest.pdf and https://support.sas.com/documentation/onlinedoc/v82/techreport_r101.pdf


When I used the orthogonal sum (effect) coding, now the coefficients are integers. So I guess that for treatment contrast and type-3 they need to be adjusted.

> options(contrasts=c("contr.sum", "contr.poly"))
> m <- lm(Response ~ Sex * Treatment , data=balanced)
> summary(aov(m))
              Df Sum Sq Mean Sq F value Pr(>F)   
Sex            1  7.299   7.299   8.455 0.0062 **
Treatment      1  0.118   0.118   0.137 0.7132   
Sex:Treatment  1  6.157   6.157   7.132 0.0113 * 
Residuals     36 31.077   0.863                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> car::Anova(m, type=3)
Anova Table (Type III tests)

Response: Response
              Sum Sq Df  F value    Pr(>F)    
(Intercept)   95.461  1 110.5819 1.593e-12 ***
Sex            7.299  1   8.4552  0.006198 ** 
Treatment      0.118  1   0.1372  0.713226    
Sex:Treatment  6.157  1   7.1324  0.011292 *  
Residuals     31.077 36                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (jt <- joint_tests(m))
 model term    df1 df2 F.ratio p.value
 Sex             1  36   8.455  0.0062
 Treatment       1  36   0.137  0.7132
 Sex:Treatment   1  36   7.132  0.0113

> attributes(jt)$est.fcns
$Sex
     (Intercept) Sex1 Treatment1 Sex1:Treatment1
[1,]           0    1          0               0

$Treatment
     (Intercept) Sex1 Treatment1 Sex1:Treatment1
[1,]           0    0          1               0

$`Sex:Treatment`
     (Intercept) Sex1 Treatment1 Sex1:Treatment1
[1,]           0    0          0              -1

Just checked the contrasts:

> t(matrix(c(-0.5*K[1,] -0.5*K[3,] + 0.5*K[2,] + 0.5*K[4,])))
     [,1] [,2] [,3] [,4]
[1,]    0    0    1  0.5
> t(matrix(c(-0.5*K[1,] + 0.5*K[3,] + 0.5*K[2,] - 0.5*K[4,])))
     [,1] [,2] [,3] [,4]
[1,]    0    0    0 -0.5
> t(matrix(c(-0.5*K[1,] + 0.5*K[3,] - 0.5*K[2,] + 0.5*K[4,])))
     [,1] [,2] [,3] [,4]
[1,]    0    1    0  0.5

are not mutually orthogonal.

PS: I remembered, that type-3 is related to weighting by the fraction of observations. I have 10 observations per cell. I tested various values and it turned out that sqrt(8/10) ~ −0.8944272 and sqrt(2/10) ~ 0.4472136. It cannot be a coincidence, so these contrasts were scaled this way.

rvlenth commented 1 month ago

The coefficients are normalized.

> tmp = c(0,-1,0,-.5)
> tmp / sqrt(sum(tmp^2))
[1]  0.0000000 -0.8944272  0.0000000 -0.4472136

So the test statistics and P values are the same, but the scaling is different