singmann / afex

Analysis of Factorial EXperiments (R package)
119 stars 32 forks source link

Unexpected outcomes when setting contrasts manually #109

Closed mspeekenbrink closed 2 years ago

mspeekenbrink commented 2 years ago

I'm getting unexpected differences in results after setting contrasts and then running afex::aov_car with check_contrasts = FALSE.

Here is a reproducible example, where I'm repeating the analysis three times. First with aov_car and check_contrasts = TRUE, second after setting contrasts with contr.sum and contr.poly and then running the same model with check_contrasts = FALSE, and finally with car::Anova(). The results of the second analysis differ quite a bit from the other two, and I don't understand why...

library(afex)
#> Loading required package: lme4
#> Loading required package: Matrix
#> ************
#> Welcome to afex. For support visit: http://afex.singmann.science/
#> - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
#> - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
#> - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
#> - NEWS: emmeans() for ANOVA models now uses model = 'multivariate' as default.
#> - Get and set global package options with: afex_options()
#> - Set orthogonal sum-to-zero contrasts globally: set_sum_contrasts()
#> - For example analyses see: browseVignettes("afex")
#> ************
#> 
#> Attaching package: 'afex'
#> The following object is masked from 'package:lme4':
#> 
#>     lmer
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(car)
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#> 
#>     recode

## prepare the data
data(fhch2010, package="afex")
ldat <- fhch2010 %>%
  mutate(density = forcats::fct_recode(density, lowdens = "low", highdens = "high"),
         frequency = forcats::fct_recode(frequency, lowfreq = "low", highfreq = "high")) %>%
  group_by(id, task, stimulus, density, frequency) %>%
  summarise(mean_rt = mean(rt))
#> `summarise()` has grouped output by 'id', 'task', 'stimulus', 'density'. You can override using the `.groups` argument.
ldat$id <- as.factor(ldat$id)
ldat$task <- as.factor(ldat$task)
ldat$stimulus <- as.factor(ldat$stimulus)
ldat$density <- as.factor(ldat$density)
ldat$frequency <- as.factor(ldat$frequency)

wdat <- tidyr::pivot_wider(ldat, id_cols = c("id", "task"), names_from = c("stimulus", "density", "frequency"), values_from = "mean_rt")

## 
mod1 <- afex::aov_car(mean_rt ~ task*stimulus*density*frequency + Error(id/(stimulus*density*frequency)), data=ldat)
#> Contrasts set to contr.sum for the following variables: task
mod1
#> Anova Table (Type 3 tests)
#> 
#> Response: mean_rt
#>                             Effect    df  MSE          F   ges p.value
#> 1                             task 1, 43 0.39  15.69 ***  .238   <.001
#> 2                         stimulus 1, 43 0.04  77.09 ***  .124   <.001
#> 3                    task:stimulus 1, 43 0.04  32.08 ***  .056   <.001
#> 4                          density 1, 43 0.00       0.38 <.001    .539
#> 5                     task:density 1, 43 0.00  32.80 ***  .005   <.001
#> 6                        frequency 1, 43 0.01       0.93 <.001    .341
#> 7                   task:frequency 1, 43 0.01  83.40 ***  .040   <.001
#> 8                 stimulus:density 1, 43 0.00       0.45 <.001    .505
#> 9            task:stimulus:density 1, 43 0.00  15.37 ***  .003   <.001
#> 10              stimulus:frequency 1, 43 0.00  52.64 ***  .013   <.001
#> 11         task:stimulus:frequency 1, 43 0.00 106.89 ***  .025   <.001
#> 12               density:frequency 1, 43 0.00     2.94 + <.001    .094
#> 13          task:density:frequency 1, 43 0.00  17.49 ***  .002   <.001
#> 14      stimulus:density:frequency 1, 43 0.01     5.86 *  .001    .020
#> 15 task:stimulus:density:frequency 1, 43 0.01    9.46 **  .002    .004
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

## after setting contrasts

contrasts(ldat$task) <- contr.sum(2)
contrasts(ldat$stimulus) <- contr.poly(2)
contrasts(ldat$density) <- contr.poly(2)
contrasts(ldat$frequency) <- contr.poly(2)
mod2 <- afex::aov_car(mean_rt ~ task*stimulus*density*frequency + Error(id/(stimulus*density*frequency)), data=ldat, check_contrasts=FALSE)
#> Warning: Calculating Type 3 sums with contrasts != 'contr.sum' for: task
#>   Results likely bogus or not interpretable!
#> You probably want check_contrasts = TRUE or options(contrasts=c('contr.sum','contr.poly'))
mod2
#> Anova Table (Type 3 tests)
#> 
#> Response: mean_rt
#>                             Effect    df  MSE          F   ges p.value
#> 1                             task 1, 43 0.39  15.69 ***  .238   <.001
#> 2                         stimulus 1, 43 0.04  93.88 ***  .147   <.001
#> 3                    task:stimulus 1, 43 0.04  32.08 ***  .056   <.001
#> 4                          density 1, 43 0.00   11.74 **  .002    .001
#> 5                     task:density 1, 43 0.00  32.80 ***  .005   <.001
#> 6                        frequency 1, 43 0.01  30.03 ***  .015   <.001
#> 7                   task:frequency 1, 43 0.01  83.40 ***  .040   <.001
#> 8                 stimulus:density 1, 43 0.00     4.75 * <.001    .035
#> 9            task:stimulus:density 1, 43 0.00  15.37 ***  .003   <.001
#> 10              stimulus:frequency 1, 43 0.00 139.30 ***  .033   <.001
#> 11         task:stimulus:frequency 1, 43 0.00 106.89 ***  .025   <.001
#> 12               density:frequency 1, 43 0.00       2.74 <.001    .105
#> 13          task:density:frequency 1, 43 0.00  17.49 ***  .002   <.001
#> 14      stimulus:density:frequency 1, 43 0.01       0.19 <.001    .662
#> 15 task:stimulus:density:frequency 1, 43 0.01    9.46 **  .002    .004
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

## perform analysis with car

idata <- data.frame(stimulus=factor(rep(c("word","nonword"), each=4)),
                    density = factor(rep(c("low","high"), each=2)),
                    frequency = factor(rep(c("low","high"))))
contrasts(idata$stimulus) <- contr.sum(2)
contrasts(idata$density) <- contr.poly(2)
contrasts(idata$frequency) <- contr.poly(2)
contrasts(wdat$task) <- contr.poly(2)
lmod <- lm(cbind(word_lowdens_lowfreq, word_lowdens_highfreq, word_highdens_lowfreq, word_highdens_highfreq, nonword_lowdens_lowfreq,nonword_lowdens_highfreq, nonword_highdens_lowfreq,nonword_highdens_highfreq) ~ task, data=wdat)
rmod <- car::Anova(lmod,type=3, idata=idata, idesign=~stimulus*density*frequency)
summary(rmod, multivariate=FALSE)
#> 
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#> 
#>                                 Sum Sq num Df Error SS den Df  F value
#> (Intercept)                     356.96      1  16.9508     43 905.5211
#> task                              6.18      1  16.9508     43  15.6871
#> stimulus                          2.80      1   1.5622     43  77.0863
#> task:stimulus                     1.17      1   1.5622     43  32.0827
#> density                           0.00      1   0.1371     43   0.3838
#> task:density                      0.10      1   0.1371     43  32.7990
#> frequency                         0.01      1   0.4285     43   0.9263
#> task:frequency                    0.83      1   0.4285     43  83.3957
#> stimulus:density                  0.00      1   0.1641     43   0.4521
#> task:stimulus:density             0.06      1   0.1641     43  15.3721
#> stimulus:frequency                0.25      1   0.2061     43  52.6397
#> task:stimulus:frequency           0.51      1   0.2061     43 106.8858
#> density:frequency                 0.01      1   0.1165     43   2.9409
#> task:density:frequency            0.05      1   0.1165     43  17.4933
#> stimulus:density:frequency        0.03      1   0.2178     43   5.8593
#> task:stimulus:density:frequency   0.05      1   0.2178     43   9.4640
#>                                    Pr(>F)    
#> (Intercept)                     < 2.2e-16 ***
#> task                            0.0002765 ***
#> stimulus                        3.839e-11 ***
#> task:stimulus                   1.124e-06 ***
#> density                         0.5388404    
#> task:density                    9.110e-07 ***
#> frequency                       0.3412008    
#> task:frequency                  1.260e-11 ***
#> stimulus:density                0.5049326    
#> task:stimulus:density           0.0003124 ***
#> stimulus:frequency              5.503e-09 ***
#> task:stimulus:frequency         3.112e-13 ***
#> density:frequency               0.0935614 .  
#> task:density:frequency          0.0001392 ***
#> stimulus:density:frequency      0.0197911 *  
#> task:stimulus:density:frequency 0.0036371 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Created on 2021-11-26 by the reprex package (v2.0.1)

singmann commented 2 years ago

Hmm, this can indeed be interpret as a bug. The problem is that factors attached to the data are not maintained when transforming the data from long to wide. However, I am not sure this is worth fixing. It seems better to me to always enforce contr.sum() (i.e., simply remove the check_contrasts argument).

This is also what car::Anova does for the idata argument. Any contrasts to it are ignored and contr.sum enforced. This means, the only thing that matters is the contrast to task here. So to replicate the erroneous results (which are implicit contr.treatment for task) you can do:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(car)
#> Loading required package: carData
#> 
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#> 
#>     recode

## prepare the data
data(fhch2010, package="afex")
ldat <- fhch2010 %>%
  mutate(
    density = forcats::fct_recode(
      .f = density, lowdens = "low", highdens = "high"),
    frequency = forcats::fct_recode(
      .f = frequency, lowfreq = "low", highfreq = "high")) %>%
  group_by(id, task, stimulus, density, frequency) %>%
  summarise(mean_rt = mean(rt))
#> `summarise()` has grouped output by 'id', 'task', 'stimulus', 'density'. You can override using the `.groups` argument.
wdat <- tidyr::pivot_wider(ldat, id_cols = c("id", "task"), 
                           names_from = c("stimulus", "density", "frequency"),
                           values_from = "mean_rt")
idata <- data.frame(stimulus=factor(rep(c("word","nonword"), each=4)),
                    density = factor(rep(c("low","high"), each=2)),
                    frequency = factor(rep(c("low","high"))))
lmod <- lm(cbind(word_lowdens_lowfreq, word_lowdens_highfreq, word_highdens_lowfreq, word_highdens_highfreq, nonword_lowdens_lowfreq,nonword_lowdens_highfreq, nonword_highdens_lowfreq,nonword_highdens_highfreq) ~ task, data=wdat)
rmod <- car::Anova(lmod,type=3, idata=idata, idesign=~stimulus*density*frequency)
summary(rmod, multivariate=FALSE)
#> 
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#> 
#>                                  Sum Sq num Df Error SS den Df  F value
#> (Intercept)                     121.130      1  16.9508     43 307.2776
#> task                              6.184      1  16.9508     43  15.6871
#> stimulus                          3.411      1   1.5622     43  93.8836
#> task:stimulus                     1.166      1   1.5622     43  32.0827
#> density                           0.037      1   0.1371     43  11.7390
#> task:density                      0.105      1   0.1371     43  32.7990
#> frequency                         0.299      1   0.4285     43  30.0345
#> task:frequency                    0.831      1   0.4285     43  83.3957
#> stimulus:density                  0.018      1   0.1641     43   4.7482
#> task:stimulus:density             0.059      1   0.1641     43  15.3721
#> stimulus:frequency                0.668      1   0.2061     43 139.2950
#> task:stimulus:frequency           0.512      1   0.2061     43 106.8858
#> density:frequency                 0.007      1   0.1165     43   2.7400
#> task:density:frequency            0.047      1   0.1165     43  17.4933
#> stimulus:density:frequency        0.001      1   0.2178     43   0.1935
#> task:stimulus:density:frequency   0.048      1   0.2178     43   9.4640
#>                                    Pr(>F)    
#> (Intercept)                     < 2.2e-16 ***
#> task                            0.0002765 ***
#> stimulus                        2.230e-12 ***
#> task:stimulus                   1.124e-06 ***
#> density                         0.0013583 ** 
#> task:density                    9.110e-07 ***
#> frequency                       2.072e-06 ***
#> task:frequency                  1.260e-11 ***
#> stimulus:density                0.0348541 *  
#> task:stimulus:density           0.0003124 ***
#> stimulus:frequency              4.478e-15 ***
#> task:stimulus:frequency         3.112e-13 ***
#> density:frequency               0.1051437    
#> task:density:frequency          0.0001392 ***
#> stimulus:density:frequency      0.6622147    
#> task:stimulus:density:frequency 0.0036371 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

contrasts(wdat$task) <- contr.poly(2)
lmod2 <- lm(cbind(word_lowdens_lowfreq, word_lowdens_highfreq, word_highdens_lowfreq, word_highdens_highfreq, nonword_lowdens_lowfreq,nonword_lowdens_highfreq, nonword_highdens_lowfreq,nonword_highdens_highfreq) ~ task, data=wdat)
rmod2 <- car::Anova(lmod2,type=3, idata=idata, idesign=~stimulus*density*frequency)
summary(rmod2, multivariate=FALSE)
#> 
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#> 
#>                                 Sum Sq num Df Error SS den Df  F value
#> (Intercept)                     356.96      1  16.9508     43 905.5211
#> task                              6.18      1  16.9508     43  15.6871
#> stimulus                          2.80      1   1.5622     43  77.0863
#> task:stimulus                     1.17      1   1.5622     43  32.0827
#> density                           0.00      1   0.1371     43   0.3838
#> task:density                      0.10      1   0.1371     43  32.7990
#> frequency                         0.01      1   0.4285     43   0.9263
#> task:frequency                    0.83      1   0.4285     43  83.3957
#> stimulus:density                  0.00      1   0.1641     43   0.4521
#> task:stimulus:density             0.06      1   0.1641     43  15.3721
#> stimulus:frequency                0.25      1   0.2061     43  52.6397
#> task:stimulus:frequency           0.51      1   0.2061     43 106.8858
#> density:frequency                 0.01      1   0.1165     43   2.9409
#> task:density:frequency            0.05      1   0.1165     43  17.4933
#> stimulus:density:frequency        0.03      1   0.2178     43   5.8593
#> task:stimulus:density:frequency   0.05      1   0.2178     43   9.4640
#>                                    Pr(>F)    
#> (Intercept)                     < 2.2e-16 ***
#> task                            0.0002765 ***
#> stimulus                        3.839e-11 ***
#> task:stimulus                   1.124e-06 ***
#> density                         0.5388404    
#> task:density                    9.110e-07 ***
#> frequency                       0.3412008    
#> task:frequency                  1.260e-11 ***
#> stimulus:density                0.5049326    
#> task:stimulus:density           0.0003124 ***
#> stimulus:frequency              5.503e-09 ***
#> task:stimulus:frequency         3.112e-13 ***
#> density:frequency               0.0935614 .  
#> task:density:frequency          0.0001392 ***
#> stimulus:density:frequency      0.0197911 *  
#> task:stimulus:density:frequency 0.0036371 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Created on 2021-11-28 by the reprex package (v2.0.1)

So I guess my main question still remains: What could be the benefit of allowing non contr.sum for the between subjects effects (as for the within subject effects they are enforced by car::Anova)? I would not see any real reason so would be inclined to simply disable check_contrasts altogether.

mspeekenbrink commented 2 years ago

I didn't realise car:;Anova() ignores contrasts in the idata argument. But a main reason why I submitted this as a bug is that before running mod2 in my code, I explicitly set contrasts(ldat$task) <- contr.sum(2). So I don't understand why (if contrasts are ignored for within-subjects effects) the results would be different between mod1 and mod2, as both should have a contr.sum for task. (I also don't understand why the warning states Calculating Type 3 sums with contrasts != 'contr.sum' for: task) Is the contrast set for ldat$task changed within the code so that it is no longer a contr.sum(2) contrast but a contr.treat(2) contrast?

mspeekenbrink commented 2 years ago

Ah, I think you mean none of the contrasts are preserved for an analysis where a move from long to wide is needed (whether within- or between-subjects). But then the check_contrasts argument is misleading. For complex designs, there may be use cases for a different form of sum-to-zero contrasts, although the examples I can think of are more relevant for mixed-effects analyses.

So as you are focusing on factorial designs, to avoid issues where people think they can set different contrasts (or even contr.sum themselves and then set check_contrasts = FALSE, it might be wise to remove this option. Or otherwise copy contrasts from the long to wide format.