Open josef-pkt opened 9 years ago
One ambiguity that shows up in some test statistics is how to define the outcome in uncomputable cases. For confidence intervals, the Fagerland et al 2015 survey article defines the confidence interval for uncomputable risk ratio to be (0, inf), see section 3, and comment at the end of section 6.3.1 about Price and Bonnet who don't include them in the coverage count.
Several other articles have comments and redefinition for boundary cases, Koopmans? Loyd? (need to check again). example add 0.5 observation if observation/success count is zero.
(This won't be relevant for calculating a pooled or constrained MLE variance in the case of a positive difference in the null hypothesis if or when one of the proportions is always in the interior, I guess.)
(We possibly need an option to change corner treatment.)
About applicability/recommendation for choosing the test or test statistic: Fagerland et all survey papers and similar articles compare test methods over a large or full range of true probabilities. Wald and to a lesser extend score tests can be very liberal but often only in relatively small parts of the space of possible probabilities. This ignores recommendations that require a minimum expected count in each cell for asymptotic tests to be a good approximation. see for example Newcomb and Nurminen 2011: In Defence of Score Intervals (they emphasize two points: minimum expected counts in actual applications and average size versus only counting size violations)
Adding Poisson two sample test here for now http://stackoverflow.com/questions/33944914/implimentation-of-e-test-for-poisson-in-python with link to http://www.ucs.louisiana.edu/~kxk4695/JSPI-04.pdf K. Krishnamoorthy, Jessica Thomson A more powerful test for comparing two Poisson means
related: stochastic dominance in the multinomial case, or two sample multinomial case, I have not looked for specific references, there is some general stochastic dominance work (for continuous random variables) and trend tests for contingency tables. There should be some general stochastic dominance tests for multinomial, i.e. essentially an extension of one-sided inequality hypotheses to multiple probability/proportion case.
Also, we (especially I) need an overview list for which test handles which case under what assumptions.
E.g. statsmodels.stats.proportions.proportions_chisquare
is not a multinomial chisquare test like scipy.stats.chisquare, it's a test for a set of binomial proportions.
application: http://stats.stackexchange.com/questions/254679/test-selection-help-for-ages-in-habermans-cancer-survival-data I think a stochastic dominance or trend test would have more power than purely categorical chisquare test. (In terms of causality we would need age as exog and cancer as endog, i.e. Bernoulli with increasing nonlinear effect of age.) (I have no ready answer without looking things up.)
"Never use Fisher's exact test !"
follow-up to #2605 confidence intervals for two sample proportions
unconditional exact tests guarantee the size (less or equal to nominal size) but are more powerful, i.e. less conservative in most cases than conditional exact tests like Fisher's exact.
Exact tests use the (assumed) population distribution, either binomial or multinomial or similar, to evaluate the probability of observing samples that have an at least as extreme test statistic as the observed one.
Aside: I haven't seen it mentioned anywhere yet. Assuming binomial or multinomial distribution is not innocent, the distributional assumption could be incorrect if we have for example over dispersion because of unobserved heterogeneity and/or correlation (with possible under dispersion). Some examples (applications) sound at least "fishy" to my (e.g. incidence counts of contagious disease.)
As prototype I have test for equality of two independent proportions, similar to http://www4.stat.ncsu.edu/~boos/exact/ (results identical at the 4 displayed decimals.) (currently brute force only and not optimized, arrays in size of cardinality of sample space)
variation: using different test statistics in 1. for 2., 3., and 4.
other approximations:
target sample size
target null and alternative
target tests and sampling
target results
we should get for simple tests what we have in t_test/wald_test after estimating models, plus power and sample size calculation.
connection to models
...
(my short wishlist)