josef-pkt commented 9 years ago

"Never use Fisher's exact test !"

follow-up to #2605 confidence intervals for two sample proportions

unconditional exact tests guarantee the size (less or equal to nominal size) but are more powerful, i.e. less conservative in most cases than conditional exact tests like Fisher's exact.

Exact tests use the (assumed) population distribution, either binomial or multinomial or similar, to evaluate the probability of observing samples that have an at least as extreme test statistic as the observed one.

Aside: I haven't seen it mentioned anywhere yet. Assuming binomial or multinomial distribution is not innocent, the distributional assumption could be incorrect if we have for example over dispersion because of unobserved heterogeneity and/or correlation (with possible under dispersion). Some examples (applications) sound at least "fishy" to my (e.g. incidence counts of contagious disease.)

As prototype I have test for equality of two independent proportions, similar to http://www4.stat.ncsu.edu/~boos/exact/ (results identical at the 4 displayed decimals.) (currently brute force only and not optimized, arrays in size of cardinality of sample space)

1. asymptotic (score) test
1. exact distribution but nuisance parameter replaced by constraint MLE
1. sup/max over space of nuisance parameters, or Berger and Boos subspace (confidence interval), max of 1.
1. E+M Loyd and Shan articles (not implemented yet), max of 2.

variation: using different test statistics in 1. for 2., 3., and 4.

other approximations:

mid-p p-values
small sample and continuity corrections (using Yates is even worse than Fisher's exact)

target sample size

with some optimizations we should be able to handle unconditional exact for small to moderate sample, up to a few hundred observations
for larger samples we can use some approximations, or bootstrap simulations instead of full enumeration.

target null and alternative

testing for equality is relatively simple, 2 values are the same under the Null
testing for pre-specified difference is needed for inferiority/superiority and equivalence tests and for (score) confidence intervals. analytical expression for constraint MLE are in the literature (roots of low degree polynomial)
alternatives one and two sided

target tests and sampling

2 independent proportions - 2x2 contingency table with one fixed margin
paired sample - reduced sample space because only divergent results matter
2x2 multinomial - only total number of observations fixed
larger than 2x2: 2xK, 2x2xK, ... sample space blows up fast
Poisson: exact comparison of intensities - no fixed sample size (article with Berger/Boos sup test)

target results

we should get for simple tests what we have in t_test/wald_test after estimating models, plus power and sample size calculation.

effect size
pvalue
confidence interval of effect size
maybe asymptotic standard error of effect size
power

connection to models

analogous tests and results based on asymptotics in GLM and binary discrete models.

...

(my short wishlist)

josef-pkt commented 9 years ago

One ambiguity that shows up in some test statistics is how to define the outcome in uncomputable cases. For confidence intervals, the Fagerland et al 2015 survey article defines the confidence interval for uncomputable risk ratio to be (0, inf), see section 3, and comment at the end of section 6.3.1 about Price and Bonnet who don't include them in the coverage count.

Several other articles have comments and redefinition for boundary cases, Koopmans? Loyd? (need to check again). example add 0.5 observation if observation/success count is zero.

(This won't be relevant for calculating a pooled or constrained MLE variance in the case of a positive difference in the null hypothesis if or when one of the proportions is always in the interior, I guess.)

(We possibly need an option to change corner treatment.)

josef-pkt commented 9 years ago

About applicability/recommendation for choosing the test or test statistic: Fagerland et all survey papers and similar articles compare test methods over a large or full range of true probabilities. Wald and to a lesser extend score tests can be very liberal but often only in relatively small parts of the space of possible probabilities. This ignores recommendations that require a minimum expected count in each cell for asymptotic tests to be a good approximation. see for example Newcomb and Nurminen 2011: In Defence of Score Intervals (they emphasize two points: minimum expected counts in actual applications and average size versus only counting size violations)

josef-pkt commented 8 years ago

Adding Poisson two sample test here for now http://stackoverflow.com/questions/33944914/implimentation-of-e-test-for-poisson-in-python with link to http://www.ucs.louisiana.edu/~kxk4695/JSPI-04.pdf K. Krishnamoorthy, Jessica Thomson A more powerful test for comparing two Poisson means

josef-pkt commented 8 years ago

2931 proportions with multinomial sampling (one sample multinomial)

josef-pkt commented 8 years ago

related: stochastic dominance in the multinomial case, or two sample multinomial case, I have not looked for specific references, there is some general stochastic dominance work (for continuous random variables) and trend tests for contingency tables. There should be some general stochastic dominance tests for multinomial, i.e. essentially an extension of one-sided inequality hypotheses to multiple probability/proportion case.

Also, we (especially I) need an overview list for which test handles which case under what assumptions. E.g. statsmodels.stats.proportions.proportions_chisquare is not a multinomial chisquare test like scipy.stats.chisquare, it's a test for a set of binomial proportions.

josef-pkt commented 7 years ago

application: http://stats.stackexchange.com/questions/254679/test-selection-help-for-ages-in-habermans-cancer-survival-data I think a stochastic dominance or trend test would have more power than purely categorical chisquare test. (In terms of causality we would need age as exog and cancer as endog, i.e. Bernoulli with increasing nonlinear effect of age.) (I have no ready answer without looking things up.)

statsmodels / statsmodels

ENH: (almost) exact hypothesis tests for proportions and contingency tables #2607

2931 proportions with multinomial sampling (one sample multinomial)