statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.12k stars 2.88k forks source link

FAQ/FAQ-D: statistical tests: sign convention for 2-sample comparison #811

Open josef-pkt opened 11 years ago

josef-pkt commented 11 years ago

edit: summary

current convention for two sample tests considers "sample2" as the reference case (reference case is the second sample) with signature function(sample1, sample2, value=....)

H0: stat(sample1) - stat(sample2) - value = 0
versus
H1:  stat(sample1) - stat(sample2) - value != 0

and analogous one-sided and TOST versions

variation for ratio

H0: stat(sample1) - stat(sample2) * value = 0
versus
H1:  stat(sample1) - stat(sample2) * value != 0

or

H0: stat(sample1) / stat(sample2)  - value = 0
versus
H1:  stat(sample1) / stat(sample2)  - value != 0

original comment stat_test(sample1, sample2) or stat_test(sample0, sample1)

do we use stat(sample1) - stat(sample2) or stat(sample2) - stat(sample1) or stat(sample1) - stat(sample0) (gof.chisquare_effectsize)

I just changed proportion.proportion_effectsize to the first version to match R pwr.

This needs a consistency check and possible refactoring across all statistical tests.

chisquare_effectsize(p0, p1) is the same sequence as R pwr, but looks reversed to me compared to proportions effects size

It's also for one sample comparison, where one of them is the hypothesized value. In two sample tests, we also have cases with sample1, sample2, value, where the null hypothesis is stat(sample1) - stat(sample2) - value = 0 (where value is sometimes also called diff .)

josef-pkt commented 6 years ago

weightstats _zstat_generic uses zstat = (value1 - value2 - diff) / std_diff

docstring of ttest_ind(x1, x2, ... mentions "difference" but doesn't specify diff = mu1 - mu2 versus diff = mu2 - mu1