statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.17k stars 2.89k forks source link

ENH hypothesis tests, confint for multinomial proportions (exact) #2931

Open josef-pkt opened 8 years ago

josef-pkt commented 8 years ago

companion to issue #2895

Cai, Yong, and K. Krishnamoorthy. "Exact size and power properties of five tests for multinomial proportions." Communications in Statistics—Simulation and Computation® 35, no. 1 (2006): 149-160. (only 7 citation in Google scholar, but looks good to me.)

The probability space for multinomial distribution is too large for exact calculation except in very small samples. For box/interval probabilities there are some faster algorithms that don't look easy to implement. (At least I didn't try to figure out the details of the algorithms in two recent articles.)

As alternatives:

Related: They also have "Nass test" which is the chisquare test with corrected distribution (scaled chisquare with adjusted degrees of freedom) which is doing very well in small, but not tiny, samples. (small in multinomial chisquare test refers to expected number of observation in each bin). Also, compared to binomial proportion both standard chisquare test and exact test work better if there are more bins, with smaller liberal resp. conservative deviation from size. LR (Q) test has a quite distored size.

Status I wrote some function that mostly work, I'm using a semi-generic function to calculate multinomial probabilities by simulation based on an indicator callback function.

(I haven't looked yet what R packages are doing in this area.)

josef-pkt commented 8 years ago

maybe another one (compares asymptotic with Bayesian confint for frequentist coverage) (I didn't read it)

Schaarschmidt, Frank, Daniel Gerhard, and Charlotte Vogel. 2017. “Simultaneous Confidence Intervals for Comparisons of Several Multinomial Samples.” Computational Statistics & Data Analysis 106 (February): 65–76. doi:10.1016/j.csda.2016.09.004.