tea-lang-org / tea-lang

DSL for experimental design and statistical analysis
Apache License 2.0
254 stars 32 forks source link

Null Hypothesis Not Rejected as Expected #33

Open meli365 opened 4 years ago

meli365 commented 4 years ago

Working with this example, with lines 24-25 uncommented so that Weight is defined as an ordinal variable.

When using this hypothesis: results = tea.hypothesize(['Sport', 'Weight'], ['Sport:Wrestling > Swimming']) the null hypothesis is rejected as expected, with the following output: The median of Weight for Sport = Wrestling is significantly greater than the median for Sport = Swimming.

We expect that flipping the inequality would also result in rejection of the null hypothesis: results = tea.hypothesize(['Sport', 'Weight'], ['Sport:Wrestling < Swimming']) but instead it produces the following output: There is no difference in medians between Sport = Swimming and Sport = Wrestling on Weight.

Note: The values of test statistic, p value, alpha, dof, effect size, etc. are the same in the output from both hypotheses.

MaLiN2223 commented 4 years ago

I have debugged the code and this is what I found: The line below calculates 1-pvalue for 'less than' prediction but Mann–Whitney U test returns identical p-value for both examples. https://github.com/emjun/tea-lang/blob/a93dc6ad249d9519c6b99b8af034b3e319e216a2/tea/runtimeDataStructures/testResult.py#L184

Note that we are doing two-sided MW test https://github.com/emjun/tea-lang/blob/a93dc6ad249d9519c6b99b8af034b3e319e216a2/tea/helpers/evaluateHelperMethods.py#L484

emjun commented 4 years ago

Looking into this further, I don't think this is a bug.

We would not expect that both those inequalities would reject the null hypothesis because we are testing two different hypotheses.

By expressing a one-sided hypothesis, we are not looking for effects in the other direction.

What is confusing is that the null hypothesis is the same for one-sided (greater or lesser) and two-sided hypotheses. The null hypothesis is that there is no difference in medians.

I find this page helpful in providing general background and a similar example: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/

emjun commented 4 years ago

I think this is a great example of limitations of and opportunities to improve output for Tea and many other tools. @meli365