statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.96k stars 2.86k forks source link

ENH: (almost) exact hypothesis tests for proportions and contingency tables #2607

Open josef-pkt opened 9 years ago

josef-pkt commented 9 years ago

"Never use Fisher's exact test !"

follow-up to #2605 confidence intervals for two sample proportions

unconditional exact tests guarantee the size (less or equal to nominal size) but are more powerful, i.e. less conservative in most cases than conditional exact tests like Fisher's exact.

Exact tests use the (assumed) population distribution, either binomial or multinomial or similar, to evaluate the probability of observing samples that have an at least as extreme test statistic as the observed one.

Aside: I haven't seen it mentioned anywhere yet. Assuming binomial or multinomial distribution is not innocent, the distributional assumption could be incorrect if we have for example over dispersion because of unobserved heterogeneity and/or correlation (with possible under dispersion). Some examples (applications) sound at least "fishy" to my (e.g. incidence counts of contagious disease.)

As prototype I have test for equality of two independent proportions, similar to http://www4.stat.ncsu.edu/~boos/exact/ (results identical at the 4 displayed decimals.) (currently brute force only and not optimized, arrays in size of cardinality of sample space)

variation: using different test statistics in 1. for 2., 3., and 4.

other approximations:

target sample size

target null and alternative

target tests and sampling

target results

we should get for simple tests what we have in t_test/wald_test after estimating models, plus power and sample size calculation.

connection to models

...

(my short wishlist)

josef-pkt commented 9 years ago

One ambiguity that shows up in some test statistics is how to define the outcome in uncomputable cases. For confidence intervals, the Fagerland et al 2015 survey article defines the confidence interval for uncomputable risk ratio to be (0, inf), see section 3, and comment at the end of section 6.3.1 about Price and Bonnet who don't include them in the coverage count.

Several other articles have comments and redefinition for boundary cases, Koopmans? Loyd? (need to check again). example add 0.5 observation if observation/success count is zero.

(This won't be relevant for calculating a pooled or constrained MLE variance in the case of a positive difference in the null hypothesis if or when one of the proportions is always in the interior, I guess.)

(We possibly need an option to change corner treatment.)

josef-pkt commented 9 years ago

About applicability/recommendation for choosing the test or test statistic: Fagerland et all survey papers and similar articles compare test methods over a large or full range of true probabilities. Wald and to a lesser extend score tests can be very liberal but often only in relatively small parts of the space of possible probabilities. This ignores recommendations that require a minimum expected count in each cell for asymptotic tests to be a good approximation. see for example Newcomb and Nurminen 2011: In Defence of Score Intervals (they emphasize two points: minimum expected counts in actual applications and average size versus only counting size violations)

josef-pkt commented 8 years ago

Adding Poisson two sample test here for now http://stackoverflow.com/questions/33944914/implimentation-of-e-test-for-poisson-in-python with link to http://www.ucs.louisiana.edu/~kxk4695/JSPI-04.pdf K. Krishnamoorthy, Jessica Thomson A more powerful test for comparing two Poisson means

josef-pkt commented 8 years ago

2931 proportions with multinomial sampling (one sample multinomial)

josef-pkt commented 8 years ago

related: stochastic dominance in the multinomial case, or two sample multinomial case, I have not looked for specific references, there is some general stochastic dominance work (for continuous random variables) and trend tests for contingency tables. There should be some general stochastic dominance tests for multinomial, i.e. essentially an extension of one-sided inequality hypotheses to multiple probability/proportion case.

Also, we (especially I) need an overview list for which test handles which case under what assumptions. E.g. statsmodels.stats.proportions.proportions_chisquare is not a multinomial chisquare test like scipy.stats.chisquare, it's a test for a set of binomial proportions.

josef-pkt commented 7 years ago

application: http://stats.stackexchange.com/questions/254679/test-selection-help-for-ages-in-habermans-cancer-survival-data I think a stochastic dominance or trend test would have more power than purely categorical chisquare test. (In terms of causality we would need age as exog and cancer as endog, i.e. Bernoulli with increasing nonlinear effect of age.) (I have no ready answer without looking things up.)