tompollard / tableone

Create "Table 1" for research papers in Python
https://pypi.python.org/pypi/tableone/
MIT License
161 stars 38 forks source link

added `chi_correction` as argument since you risk overfitting the data #113

Closed jakob1379 closed 2 years ago

jakob1379 commented 3 years ago

I am currently writing my thesis and stumbled across this almost finished masterpiece. After fiddling around with it I couldn't figure out why some of the p-values seemed extremely optimistic. After going through the code thoroughly I figured that the chi2_contingency didn't have the opportunity to disable correction.

From Wikipedia: == Yates's correction for continuity == Main article: Yates's correction for continuity

Using the chi-squared distribution to interpret Pearson's chi-squared statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. This assumption is not quite correct and introduces some error.

To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts the formula for Pearson's chi-squared test by subtracting 0.5 from the absolute difference between each observed value and its expected value in a 2 Γ— 2 contingency table.[9] This reduces the chi-squared value obtained and thus increases its p-value.


After adding this I confirmed that my results were right. Hope you find this useful :)

jakob1379 commented 2 years ago

Was this unintended closed as well? 😊

tompollard commented 2 years ago

Yes, same! We're trying to spend some time making improvements, bug fixes etc, and we'll be working through outstanding PRs. Sorry it's taken so long to get this. Feel free to add suggestions etc to the project board at: https://github.com/users/tompollard/projects/1

jakob1379 commented 2 years ago

Don't sweat it, I'm just happy to see the project is alive. It really made my thesis much easier to deal with! 😊