Closed mspeekenbrink closed 2 years ago
Hmm, this can indeed be interpret as a bug. The problem is that factors attached to the data are not maintained when transforming the data from long to wide. However, I am not sure this is worth fixing. It seems better to me to always enforce contr.sum()
(i.e., simply remove the check_contrasts
argument).
This is also what car::Anova
does for the idata
argument. Any contrasts to it are ignored and contr.sum
enforced. This means, the only thing that matters is the contrast to task
here. So to replicate the erroneous results (which are implicit contr.treatment
for task
) you can do:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(car)
#> Loading required package: carData
#>
#> Attaching package: 'car'
#> The following object is masked from 'package:dplyr':
#>
#> recode
## prepare the data
data(fhch2010, package="afex")
ldat <- fhch2010 %>%
mutate(
density = forcats::fct_recode(
.f = density, lowdens = "low", highdens = "high"),
frequency = forcats::fct_recode(
.f = frequency, lowfreq = "low", highfreq = "high")) %>%
group_by(id, task, stimulus, density, frequency) %>%
summarise(mean_rt = mean(rt))
#> `summarise()` has grouped output by 'id', 'task', 'stimulus', 'density'. You can override using the `.groups` argument.
wdat <- tidyr::pivot_wider(ldat, id_cols = c("id", "task"),
names_from = c("stimulus", "density", "frequency"),
values_from = "mean_rt")
idata <- data.frame(stimulus=factor(rep(c("word","nonword"), each=4)),
density = factor(rep(c("low","high"), each=2)),
frequency = factor(rep(c("low","high"))))
lmod <- lm(cbind(word_lowdens_lowfreq, word_lowdens_highfreq, word_highdens_lowfreq, word_highdens_highfreq, nonword_lowdens_lowfreq,nonword_lowdens_highfreq, nonword_highdens_lowfreq,nonword_highdens_highfreq) ~ task, data=wdat)
rmod <- car::Anova(lmod,type=3, idata=idata, idesign=~stimulus*density*frequency)
summary(rmod, multivariate=FALSE)
#>
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#>
#> Sum Sq num Df Error SS den Df F value
#> (Intercept) 121.130 1 16.9508 43 307.2776
#> task 6.184 1 16.9508 43 15.6871
#> stimulus 3.411 1 1.5622 43 93.8836
#> task:stimulus 1.166 1 1.5622 43 32.0827
#> density 0.037 1 0.1371 43 11.7390
#> task:density 0.105 1 0.1371 43 32.7990
#> frequency 0.299 1 0.4285 43 30.0345
#> task:frequency 0.831 1 0.4285 43 83.3957
#> stimulus:density 0.018 1 0.1641 43 4.7482
#> task:stimulus:density 0.059 1 0.1641 43 15.3721
#> stimulus:frequency 0.668 1 0.2061 43 139.2950
#> task:stimulus:frequency 0.512 1 0.2061 43 106.8858
#> density:frequency 0.007 1 0.1165 43 2.7400
#> task:density:frequency 0.047 1 0.1165 43 17.4933
#> stimulus:density:frequency 0.001 1 0.2178 43 0.1935
#> task:stimulus:density:frequency 0.048 1 0.2178 43 9.4640
#> Pr(>F)
#> (Intercept) < 2.2e-16 ***
#> task 0.0002765 ***
#> stimulus 2.230e-12 ***
#> task:stimulus 1.124e-06 ***
#> density 0.0013583 **
#> task:density 9.110e-07 ***
#> frequency 2.072e-06 ***
#> task:frequency 1.260e-11 ***
#> stimulus:density 0.0348541 *
#> task:stimulus:density 0.0003124 ***
#> stimulus:frequency 4.478e-15 ***
#> task:stimulus:frequency 3.112e-13 ***
#> density:frequency 0.1051437
#> task:density:frequency 0.0001392 ***
#> stimulus:density:frequency 0.6622147
#> task:stimulus:density:frequency 0.0036371 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
contrasts(wdat$task) <- contr.poly(2)
lmod2 <- lm(cbind(word_lowdens_lowfreq, word_lowdens_highfreq, word_highdens_lowfreq, word_highdens_highfreq, nonword_lowdens_lowfreq,nonword_lowdens_highfreq, nonword_highdens_lowfreq,nonword_highdens_highfreq) ~ task, data=wdat)
rmod2 <- car::Anova(lmod2,type=3, idata=idata, idesign=~stimulus*density*frequency)
summary(rmod2, multivariate=FALSE)
#>
#> Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
#>
#> Sum Sq num Df Error SS den Df F value
#> (Intercept) 356.96 1 16.9508 43 905.5211
#> task 6.18 1 16.9508 43 15.6871
#> stimulus 2.80 1 1.5622 43 77.0863
#> task:stimulus 1.17 1 1.5622 43 32.0827
#> density 0.00 1 0.1371 43 0.3838
#> task:density 0.10 1 0.1371 43 32.7990
#> frequency 0.01 1 0.4285 43 0.9263
#> task:frequency 0.83 1 0.4285 43 83.3957
#> stimulus:density 0.00 1 0.1641 43 0.4521
#> task:stimulus:density 0.06 1 0.1641 43 15.3721
#> stimulus:frequency 0.25 1 0.2061 43 52.6397
#> task:stimulus:frequency 0.51 1 0.2061 43 106.8858
#> density:frequency 0.01 1 0.1165 43 2.9409
#> task:density:frequency 0.05 1 0.1165 43 17.4933
#> stimulus:density:frequency 0.03 1 0.2178 43 5.8593
#> task:stimulus:density:frequency 0.05 1 0.2178 43 9.4640
#> Pr(>F)
#> (Intercept) < 2.2e-16 ***
#> task 0.0002765 ***
#> stimulus 3.839e-11 ***
#> task:stimulus 1.124e-06 ***
#> density 0.5388404
#> task:density 9.110e-07 ***
#> frequency 0.3412008
#> task:frequency 1.260e-11 ***
#> stimulus:density 0.5049326
#> task:stimulus:density 0.0003124 ***
#> stimulus:frequency 5.503e-09 ***
#> task:stimulus:frequency 3.112e-13 ***
#> density:frequency 0.0935614 .
#> task:density:frequency 0.0001392 ***
#> stimulus:density:frequency 0.0197911 *
#> task:stimulus:density:frequency 0.0036371 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2021-11-28 by the reprex package (v2.0.1)
So I guess my main question still remains: What could be the benefit of allowing non contr.sum
for the between subjects effects (as for the within subject effects they are enforced by car::Anova
)? I would not see any real reason so would be inclined to simply disable check_contrasts
altogether.
I didn't realise car:;Anova()
ignores contrasts in the idata
argument. But a main reason why I submitted this as a bug is that before running mod2
in my code, I explicitly set contrasts(ldat$task) <- contr.sum(2)
. So I don't understand why (if contrasts are ignored for within-subjects effects) the results would be different between mod1
and mod2
, as both should have a contr.sum
for task
. (I also don't understand why the warning states Calculating Type 3 sums with contrasts != 'contr.sum' for: task
) Is the contrast set for ldat$task
changed within the code so that it is no longer a contr.sum(2)
contrast but a contr.treat(2)
contrast?
Ah, I think you mean none of the contrasts are preserved for an analysis where a move from long to wide is needed (whether within- or between-subjects). But then the check_contrasts
argument is misleading. For complex designs, there may be use cases for a different form of sum-to-zero contrasts, although the examples I can think of are more relevant for mixed-effects analyses.
So as you are focusing on factorial designs, to avoid issues where people think they can set different contrasts (or even contr.sum
themselves and then set check_contrasts = FALSE
, it might be wise to remove this option. Or otherwise copy contrasts from the long to wide format.
I'm getting unexpected differences in results after setting contrasts and then running
afex::aov_car
withcheck_contrasts = FALSE
.Here is a reproducible example, where I'm repeating the analysis three times. First with
aov_car
andcheck_contrasts = TRUE
, second after setting contrasts withcontr.sum
andcontr.poly
and then running the same model withcheck_contrasts = FALSE
, and finally withcar::Anova()
. The results of the second analysis differ quite a bit from the other two, and I don't understand why...Created on 2021-11-26 by the reprex package (v2.0.1)