raphaelvallat / pingouin

Statistical package in Python based on Pandas
https://pingouin-stats.org/
GNU General Public License v3.0
1.6k stars 137 forks source link

pg.bayesfactor_pearson(r=-0.856, n=64, alternative="greater") returns wrong BF value #427

Open tomasdominik opened 2 months ago

tomasdominik commented 2 months ago

I have two sets of paired data (n=64). Person correlation between the sets is -0.856 (precise value -0.856341390601075). Using pg.bayesfactor_pearson(r=-0.856, n=64, alternative="two-sided"), I receive correct Bayes factor of ~2.7309e+16 (checked with JASP 0.17).

Using pg.bayesfactor_pearson(r=-0.856, n=64, alternative="less"), I receive Bayes factor of ~5.4617e+16, which also corresponds with JASP.

However, pg.bayesfactor_pearson(r=-0.856, n=64, alternative="greater") returns Bayes factor of 976, which is incorrect (JASP shows 1e-317). It also obviously cannot be the right answer either way, because the correlation is negative, and so one-tailed correlation test assuming positive correlation cannot show evidence for the alternative.

Is this a bug in pingouin or is this a case of float underflow due to the miniscule size of the correct BF?

raphaelvallat commented 1 week ago

Thanks for opening the issue. I finally found some time to dive into this. I think it is the latter, but unfortunately I have not been able to find a solution for it. For reference:

for r in [-0.95, -0.8, -0.6, -0.54, -0.5, -0.2, -0.1]:
    print(f"r = {r}, n = 200")
    pg.bayesfactor_pearson(r=r, n=200, alternative="less", method="ly")
    print()

gives

r = -0.95, n = 200
two-tailed = nan, less = nan, greater = nan

r = -0.8, n = 200
two-tailed = 2.726641354240274e+42, less = 5.45328270848037e+42, greater = 1.82596155794594e+29

r = -0.6, n = 200
two-tailed = 8.807899473881192e+17, less = 1.76157989477619e+18, greater = 50944.0

r = -0.54, n = 200
two-tailed = 41931590197275.336, less = 83863180394547.7, greater = 3.0

r = -0.5, n = 200
two-tailed = 156109336868.3086, less = 312218673736.598, greater = 0.01953125

r = -0.2, n = 200
two-tailed = 4.839589594799137, less = 9.65632205215847, greater = 0.0228571374398054

r = -0.1, n = 200
two-tailed = 0.23705535587016244, less = 0.435957165271682, greater = 0.0381535464686424

Note the incorrect shift from a BF_greater < 1 to a BF_greater > 1 at r=-0.54. Instead, we'd expect to see BF_greater becoming smaller and smaller with a more negative r.

I think a workaround could be to set BF_greater to 0 if BF_less is very large (and vice versa).

tomasdominik commented 1 week ago

I agree, that's exactly the workaround I ended up using. There probably is a more correct way to deal with the problem, but applying what you suggest (perhaps with a warning for the user) might be good enough. Thank you for looking into it.