P-value adjustment using "mvt" and the "single-step" in glht() yielded very different results

hy-lu commented 6 years ago

Hello, as I was trying out some interaction contrasts for a linear mixed model using afex::mixed(), I found that using the "mvt" adjustment in emmeans generated different p-values from those using "single-step" adjustment in multcomp::glht by using as.glht.

I suppose they should be the same or very close to each other as I've read it somewhere in a vignette or pdf manual. I re-create this issue in the following example using the auto.noise data (though it's just a linear model, not a mixed model).

Is it me using it wrong or a bug?

set.seed(1234)
library(emmeans)
library(multcomp)
#> Loading required package: mvtnorm
#> Loading required package: survival
#> Loading required package: TH.data
#> Loading required package: MASS
#> 
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#> 
#>     geyser
#> 
#> Attaching package: 'multcomp'
#> The following object is masked from 'package:emmeans':
#> 
#>     cld
noise.lm <- lm(noise ~ size * type * side, data = auto.noise)

# using "mvt"adjustment ----
contrast(
  emmeans(noise.lm, ~ size * type * side),
  interaction = c("pairwise", "consec"),
  by = "side",
  adjust = "mvt"
)
#> side = L:
#>  size_pairwise type_consec   estimate       SE df t.ratio p.value
#>  S - M         Octel - Std  18.333333 4.409586 24   4.158  0.0020
#>  S - L         Octel - Std -11.666667 4.409586 24  -2.646  0.0701
#>  M - L         Octel - Std -30.000000 4.409586 24  -6.803  <.0001
#> 
#> side = R:
#>  size_pairwise type_consec   estimate       SE df t.ratio p.value
#>  S - M         Octel - Std  23.333333 4.409586 24   5.292  0.0001
#>  S - L         Octel - Std  15.000000 4.409586 24   3.402  0.0126
#>  M - L         Octel - Std  -8.333333 4.409586 24  -1.890  0.2929
#> 
#> P value adjustment: mvt method for 3 tests

# using "single-step" as in glht() ----
summary(as.glht(contrast(
  emmeans(noise.lm, ~ size * type * side),
  interaction = c("pairwise", "consec"),
  by = "side"
)))
#> $`side = L`
#> 
#>   Simultaneous Tests for General Linear Hypotheses
#> 
#> Linear Hypotheses:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> S - M, Octel - Std == 0    18.33       4.41   4.158   <0.001 ***
#> S - L, Octel - Std == 0   -11.67       4.41  -2.646   0.0362 *  
#> M - L, Octel - Std == 0   -30.00       4.41  -6.803   <0.001 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
#> 
#> 
#> $`side = R`
#> 
#>   Simultaneous Tests for General Linear Hypotheses
#> 
#> Linear Hypotheses:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> S - M, Octel - Std == 0   23.333      4.410   5.292  < 0.001 ***
#> S - L, Octel - Std == 0   15.000      4.410   3.402  0.00631 ** 
#> M - L, Octel - Std == 0   -8.333      4.410  -1.890  0.16323    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)

^{Created on 2018-10-29 by the reprex package (v0.2.1)}

Session info

``` r sessionInfo() #> R version 3.5.1 (2018-07-02) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 17134) #> #> Matrix products: default #> #> locale: #> [1] LC_COLLATE=Chinese (Simplified)_China.936 #> [2] LC_CTYPE=Chinese (Simplified)_China.936 #> [3] LC_MONETARY=Chinese (Simplified)_China.936 #> [4] LC_NUMERIC=C #> [5] LC_TIME=Chinese (Simplified)_China.936 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] multcomp_1.4-8 TH.data_1.0-9 MASS_7.3-50 survival_2.42-3 #> [5] mvtnorm_1.0-8 emmeans_1.3.0 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_0.12.19 knitr_1.20 magrittr_1.5 splines_3.5.1 #> [5] xtable_1.8-3 lattice_0.20-35 plyr_1.8.4 stringr_1.3.1 #> [9] tools_3.5.1 grid_3.5.1 coda_0.19-2 htmltools_0.3.6 #> [13] yaml_2.2.0 rprojroot_1.3-2 digest_0.6.18 Matrix_1.2-14 #> [17] codetools_0.2-15 evaluate_0.12 rmarkdown_1.10 sandwich_2.5-0 #> [21] estimability_1.3 stringi_1.2.4 compiler_3.5.1 backports_1.1.2 #> [25] zoo_1.8-4 ```

rvlenth commented 6 years ago

Hmmm. I was able to reproduce these results...

> con = contrast(
+     emmeans(noise.lm, ~ size * type * side),
+     interaction = c("pairwise", "consec"),
+     by = "side"
+ )

> summary(con, adjust = "mvt")
side = L:
 size_pairwise type_consec   estimate       SE df t.ratio p.value
 S - M         Octel - Std  18.333333 4.409586 24   4.158  0.0019
 S - L         Octel - Std -11.666667 4.409586 24  -2.646  0.0701
 M - L         Octel - Std -30.000000 4.409586 24  -6.803  <.0001

side = R:
 size_pairwise type_consec   estimate       SE df t.ratio p.value
 S - M         Octel - Std  23.333333 4.409586 24   5.292  0.0001
 S - L         Octel - Std  15.000000 4.409586 24   3.402  0.0127
 M - L         Octel - Std  -8.333333 4.409586 24  -1.890  0.2929

P value adjustment: mvt method for 3 tests

These results differ slightly from yours -- confirming that at least the mvt algorithm is being used (slight variation in P values due to randomized algorithm). The further mystery is that if I use a different adjustment:

> summary(con, adjust = "sidak")
side = L:
 size_pairwise type_consec   estimate       SE df t.ratio p.value
 S - M         Octel - Std  18.333333 4.409586 24   4.158  0.0011
 S - L         Octel - Std -11.666667 4.409586 24  -2.646  0.0419
 M - L         Octel - Std -30.000000 4.409586 24  -6.803  <.0001

side = R:
 size_pairwise type_consec   estimate       SE df t.ratio p.value
 S - M         Octel - Std  23.333333 4.409586 24   5.292  0.0001
 S - L         Octel - Std  15.000000 4.409586 24   3.402  0.0070
 M - L         Octel - Std  -8.333333 4.409586 24  -1.890  0.1980

P value adjustment: sidak method for 3 tests

I get that the mvt method is more conservative than sidak, which is odd because mvt is supposed to be exact and sidak conservative. So, obviously something is wrong. I only wonder now if it's always been wrong or if not, when did it go wrong?

Thanks for reporting this; I'll take a look.

rvlenth commented 6 years ago

It appears that the by variable is being ignored in the mvt adjustment. If I disable the by grouping, I get the same P values (except for randomized-algorithm discrepancies) as obtained with the grouping, and the as.glht results match as well:

> summary(con, by = NULL, adjust="mvt")
 size_pairwise type_consec side   estimate       SE df t.ratio p.value
 S - M         Octel - Std L     18.333333 4.409586 24   4.158  0.0020
 S - L         Octel - Std L    -11.666667 4.409586 24  -2.646  0.0701
 M - L         Octel - Std L    -30.000000 4.409586 24  -6.803  <.0001
 S - M         Octel - Std R     23.333333 4.409586 24   5.292  0.0002
 S - L         Octel - Std R     15.000000 4.409586 24   3.402  0.0126
 M - L         Octel - Std R     -8.333333 4.409586 24  -1.890  0.2930

P value adjustment: mvt method for 6 tests 

> summary(as.glht(con, by = NULL))

     Simultaneous Tests for General Linear Hypotheses

Linear Hypotheses:
                           Estimate Std. Error t value Pr(>|t|)
S - M, Octel - Std, L == 0   18.333      4.410   4.158  0.00195
S - L, Octel - Std, L == 0  -11.667      4.410  -2.646  0.07008
M - L, Octel - Std, L == 0  -30.000      4.410  -6.803  < 0.001
S - M, Octel - Std, R == 0   23.333      4.410   5.292  < 0.001
S - L, Octel - Std, R == 0   15.000      4.410   3.402  0.01267
M - L, Octel - Std, R == 0   -8.333      4.410  -1.890  0.29289
(Adjusted p values reported -- single-step method)

Also, I can reproduce the as.glht results with the by grouping by doing each group manually:

> summary(con[1:3], adjust="mvt")
 size_pairwise type_consec side  estimate       SE df t.ratio p.value
 S - M         Octel - Std L     18.33333 4.409586 24   4.158  0.0010
 S - L         Octel - Std L    -11.66667 4.409586 24  -2.646  0.0363
 M - L         Octel - Std L    -30.00000 4.409586 24  -6.803  <.0001

P value adjustment: mvt method for 3 tests 

> summary(con[4:6], adjust="mvt")
 size_pairwise type_consec side  estimate       SE df t.ratio p.value
 S - M         Octel - Std R    23.333333 4.409586 24   5.292  <.0001
 S - L         Octel - Std R    15.000000 4.409586 24   3.402  0.0063
 M - L         Octel - Std R    -8.333333 4.409586 24  -1.890  0.1635

P value adjustment: mvt method for 3 tests

Unfortunately, I think this means it's been wrong from the get-go, when by variables were in play.

rvlenth commented 6 years ago

OK; it was relatively easy to fix. I had a flag to compute all P values (and critical values for CIs) at once (rather than separately in each by group) when all the by groups were the same size. This works fine for Tukey, Sidak, Scheffe, etc. adjustments, but not for mvt. So this bug has existed since later renditions of lsmeans, when provisions for nested factors were added.

hy-lu commented 6 years ago

Thanks very much for your quick reply and fix!

rvlenth / emmeans

P-value adjustment using "mvt" and the "single-step" in glht() yielded very different results #57