statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.15k stars 2.88k forks source link

Bugs for function statsmodels.stats.proportion.proportion_confint #7008

Closed jinyuan-jia closed 2 years ago

jinyuan-jia commented 4 years ago

Describe the bug

Hello, I met a bug when I use function statsmodels.stats.proportion.proportion_confint to estimate confidence interval. I have attached the code to reproduce the issue.

Code Sample, a copy-pastable example if possible

from statsmodels.stats.proportion import proportion_confint count_list=[0,10,100,1000] for count in count_list: print(proportion_confint(min(max(count, 1e-15), 100000-1e-15), 100000, alpha=1e-11, method="beta"))

Expected Output

The output is as follows (lower bound, upper bound): (0.0, 0.00026020034420181726) (3.4637672992749764e-06, 0.059396009405412224) (0.00046224918454042924, 0.0018509801594462916) (0.008004250120330274, 0.012302551885285284)

As the result shows, when count=10, the upper bound is 0.059. However, when count=100, the upper bound is 0.00185 (smaller than 0.059) which is undesired. In particular, the upper bound should increase with the increasing of count.

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of ``import statsmodels.api as sm; sm.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ Python: 3.6.9.final.0 OS: Linux 4.15.0-109-generic #110-Ubuntu SMP Tue Jun 23 02:39:32 UTC 2020 x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 statsmodels =========== Installed: 0.12.0 Required Dependencies ===================== cython: 0.29.21 numpy: 1.19.1 scipy: 1.5.2 pandas: 1.1.1 dateutil: 2.8.1 patsy: 0.5.1
josef-pkt commented 4 years ago

no idea yet exact power functions for proportions are not monotonic, but I guess (!) that shouldn't be the case here.

"beta" method just calls stats.beta.ppf and isf. So if there is a bug, then it is most likely some numerical problems in scipy special for some corner/extreme cases.

adjusted case, non-monotonicity starts at alpha=1e-5

count_list=np.arange(1, 20) #[0,10,100,1000]
alpha=1e-11
alpha=1e-5
res = []
for count in count_list:
    print(min(max(count, 1e-15), 100000-1e-15))
    low, upp = proportion_confint(min(max(count, 1e-15), 100000-1e-15), 100000, alpha=alpha, method="beta")
    print((low, upp))
    res.append([low, upp])

res = np.asarray(res)

print("\n mindiff", np.diff(res, axis=0).min(0))

mindiff [ 3.16063160e-08 -3.50507594e-02]

upp is shifted upwards for count is 3, 4 or 5

jinyuan-jia commented 4 years ago

Thanks for your time. I checked the source code and I guess the issue comes from scipy.stats.beta.isf . Again, thanks for your response.

josef-pkt commented 4 years ago

I opened an issue for scipy.special

jinyuan-jia commented 4 years ago

Thanks josef-pkt.

josef-pkt commented 2 years ago

looks like scipy 1.7 should have this problem fixed underlying functions for beta are now taken from Boost.

needs checking, then we can close this issue

jinyuan-jia commented 2 years ago

Thank you josef-ptk. Let me close the issue.