rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.85k stars 857 forks source link

permutation_test() fails to compute the correct p-value #838

Closed tcortesp closed 3 years ago

tcortesp commented 3 years ago

Describe the bug

permutation_test() does not take into account the values that are as big as the reference value when computing the p-value of the test.

Steps/Code to Reproduce

Consider the following example from table 5.4 in Causal Inference for Statistics, Social, and Biomedical Sciences.

from mlxtend.evaluate import permutation_test

tr = [3,5,0]
ct = [4,0,1]

p_value = permutation_test(tr,ct,method='exact')

print(p_value)

Expected vs Actual Results

The p-value should be 16/20 = 0.8. Instead, the p-value that permutation_test() returns is 0.5. If you look behind the scenes, the problem is that there are 6 permutations with an observed statistic as big as the reference value which are not taken into account when computing the p-value.

Versions

MLxtend 0.14.0 Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic Python 3.7.11 (default, Jul 3 2021, 18:01:19) [GCC 7.5.0] Scikit-learn 0.22.2.post1 NumPy 1.19.5 SciPy 1.4.1

tcortesp commented 3 years ago

I realized Google Colab Ide was not using an updated version of mlxtend.

rasbt commented 3 years ago

Glad that was resolved, but yeah, it's frustrating. I hope they update it some time because I get a lot of questions due to that old version they are using.

You probably already did that, but if not, you can update it on Colab at least for your current session via

!pip install mlxtend --upgrade