test not matching aov()

xmarti6 commented 4 years ago

Hi, I recently come up with this strange behaviour, I dont quite understand what's going on, probably something related with maths that I'm not able to decipher.

This dataset works fine:


data<-data.frame(groups=c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D","E","E","E","E","E","E","E","F","F","F","F","F","F","F"),
age=c(10,24,39,16,19,20,11,50,28,21,2,34,29,52,44,48,33,56,18,36,37,1,41,42,12,51,35,9,40,30,27,53,5,31,3,13,32,7,45,6,25,43,4,17,26,23,55,22,15,47,46,54,14,38,8,49))

summary(aov(age ~ groups, data = data))
#>             Df Sum Sq Mean Sq F value Pr(>F)
#> groups       5   1220   243.9   0.909  0.483
#> Residuals   50  13410   268.2
tangram::tangram(age ~ groups, data = data)
#> ====================================================
#>         N         age             Test Statistic    
#> ----------------------------------------------------
#> groups                        F_{5,50}=0.91, P=0.483
#>   A     14  15.6 *22.5* 34.4                        
#>   B     14  17.5 *36.5* 44.3                        
#>   C     7   8.7 *30.0* 38.5                         
#>   D     7   8.0 *25.0* 41.2                         
#>   E     7   15.3 *22.0* 25.5                        
#>   F     7   18.0 *46.0* 48.7                        
#> ====================================================

^{Created on 2019-09-27 by the reprex package (v0.3.0)}

However, if I use this other dataset, the output do not match, which do not happens normally:

data<-data.frame(groups=c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D","E","E","E","E","E","E","E","F","F","F","F","F","F","F"),
age=c(113,47,141,85,87,139,139,99,75,121,111,107,39,95,47,41,55,53,135,95,115,71,57,43,53,69,67,127,85,71,55,71,97,81,43,61,39,115,57,53,49,69,97,63,99,127,55,69,55,127,89,107,93,75,73,85))

summary(aov(age ~ groups, data = data))
#>             Df Sum Sq Mean Sq F value Pr(>F)  
#> groups       5   9479    1896   2.465 0.0451 *
#> Residuals   50  38448     769                 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tangram::tangram(age ~ groups, data = data)
#> =====================================================
#>         N          age             Test Statistic    
#> -----------------------------------------------------
#> groups                         F_{5,50}=2.62, P=0.035
#>   A     14    84 *103* 123                           
#>   B     14  52.5 *62.0* 96.7                         
#>   C     7   57.7 *71.0* 84.3                         
#>   D     7   49.7 *57.0* 67.7                         
#>   E     7   56.3 *69.0* 98.7                         
#>   F     7   76.7 *89.0* 104.7                        
#> =====================================================

^{Created on 2019-09-27 by the reprex package (v0.3.0)}

There is a difference of 1%, not so much, but still can make a difference. If I calculate Anova with other tools they match the "aov()" function with no difference.

Also, if possible, I would like to know whether tangram function checks normality of variables, because according to shapiro and a qqplot over "age", seems rejecting normality assumption, and the test performed by tangram seems to be anova:

> shapiro.test(data$age)$p.value
[1] 0.03709711

Many thanks

spgarbet commented 4 years ago

Tangram provides a default transform of "Hmisc" which provides all the preferences of summaryM. All tests are non-parametric. In the first example, the F-statistic is the same by fluke. In the second, the comparison is between an F-statistic of a parametric test and from an F-statistic resulting from a Kruskal-Wallis test. Since all test results are non-parametric there is no need to test for normality.

The test in the "hmisc" transform is definitely not ANOVA.

Also, one is not limited to any default non-parametric test when using the master branch. I'm working with someone to provide matched block tests, with all the default summaries but the p-values come from pairing tests to deal with things like propensity in observational data. Follow along with Issue #50 if you want to change the transforms in the easiest manner.

spgarbet commented 4 years ago

From Frank Harrell: "The class K-W is a chi-square stat. Chi-square is what you use with infinite denominator d.f. The F stat comes exactly out of doing a regression on rank(y) vs rank(x). The F dist is slightly more accurate for getting p-values for K-W."

So basically the versions of Kruskal-Wallis that are usually taught first have an implicit assumption of infinite degrees of freedom. The F-stat is the more accurate version.

xmarti6 commented 4 years ago

Waww, awesome, these are much more sophisticated tests than I was aware of, many thanks!

spgarbet commented 4 years ago

They are very good questions to ask. I need to get a paper out that describes these. If you really want to push your stats understanding to the next level, Vanderbilt offers a "Regression Modeling Strategies" short course in the summer.

What I need to do is go back and document better these things and publish. I'm still searching for a good reference on the F-statistic in Kruskal-Wallis statistic if you find one, please post here.

spgarbet / tangram

test not matching aov() #54