kassambara / rstatix

Pipe-friendly Framework for Basic Statistical Tests in R
https://rpkgs.datanovia.com/rstatix/
445 stars 51 forks source link

Degrees of freedom in emmeans_test #163

Open thyagoleal opened 2 years ago

thyagoleal commented 2 years ago

Dear Kassambara,

Should the degrees of freedom change in the example below (from 54 to 27)? I expected that the group_by would still report df = 27 regardless, as reported in the subset dataset. Please also note the differences in the test statistic in the two examples.

Am I missing something?

Thank you.

library("rstatix")
df <- ToothGrowth
df$dose <- as.factor(df$dose)

# Pairwise comparisons grouped by supp
df %>%
 group_by(supp) %>%
 emmeans_test(len ~ dose, p.adjust.method = "bonferroni")

# A tibble: 6 × 10
  supp  term  .y.   group1 group2    df statistic        p    p.adj p.adj.signif
* <chr> <chr> <chr> <chr>  <chr>  <dbl>     <dbl>    <dbl>    <dbl> <chr>       
1 OJ    dose  len   0.5    1         54     -5.83 3.18e- 7 9.53e- 7 ****        
2 OJ    dose  len   0.5    2         54     -7.90 1.43e-10 4.29e-10 ****        
3 OJ    dose  len   1      2         54     -2.07 4.34e- 2 1.30e- 1 ns          
4 VC    dose  len   0.5    1         54     -5.41 1.46e- 6 4.39e- 6 ****        
5 VC    dose  len   0.5    2         54    -11.2  1.13e-15 3.39e-15 ****        
6 VC    dose  len   1      2         54     -5.77 3.98e- 7 1.19e- 6 ****  

# Pairwise comparisons ungrouped, but subseted
df %>%
 filter(supp == "OJ") %>%
 emmeans_test(len ~ dose, p.adjust.method = "bonferroni") 

# A tibble: 3 × 9
  term  .y.   group1 group2    df statistic            p      p.adj p.adj.signif
* <chr> <chr> <chr>  <chr>  <dbl>     <dbl>        <dbl>      <dbl> <chr>       
1 dose  len   0.5    1         27     -5.64 0.00000544      1.63e-5 ****        
2 dose  len   0.5    2         27     -7.65 0.0000000318    9.54e-8 ****        
3 dose  len   1      2         27     -2.00 0.0554          1.66e-1 ns 
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rstatix_0.7.0   forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
 [5] purrr_0.3.4     readr_2.1.2     tidyr_1.2.0     tibble_3.1.6   
 [9] ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9       lubridate_1.8.0  mvtnorm_1.1-3    lattice_0.20-45 
 [5] zoo_1.8-9        assertthat_0.2.1 utf8_1.2.2       R6_2.5.1        
 [9] cellranger_1.1.0 backports_1.4.1  reprex_2.0.1     coda_0.19-4     
[13] httr_1.4.2       pillar_1.7.0     rlang_1.0.1      multcomp_1.4-18 
[17] readxl_1.3.1     rstudioapi_0.13  car_3.0-12       Matrix_1.4-1    
[21] splines_4.1.3    munsell_0.5.0    broom_0.7.12     compiler_4.1.3  
[25] modelr_0.1.8     pkgconfig_2.0.3  tidyselect_1.1.1 codetools_0.2-18
[29] fansi_1.0.2      crayon_1.4.2     tzdb_0.2.0       dbplyr_2.1.1    
[33] withr_2.4.3      MASS_7.3-55      grid_4.1.3       jsonlite_1.7.3  
[37] xtable_1.8-4     gtable_0.3.0     lifecycle_1.0.1  DBI_1.1.2       
[41] magrittr_2.0.2   scales_1.1.1     estimability_1.3 cli_3.1.1       
[45] stringi_1.7.8    carData_3.0-5    fs_1.5.2         xml2_1.3.3      
[49] ellipsis_0.3.2   generics_0.1.2   vctrs_0.3.8      sandwich_3.0-1  
[53] TH.data_1.1-0    tools_4.1.3      glue_1.6.1       hms_1.1.1       
[57] emmeans_1.7.2    abind_1.4-5      survival_3.3-1   colorspace_2.0-2
younghoo commented 5 months ago

I met the same weird problem. Sadly, this package seems not actively maintained. The safe way may be using emmeans package directly.