def correlation_pearsonr2(data0, data1):
return stats.pearsonr(data0, data1)**2
def correlation_p_pearsonr2(data0, data1):
correlation = correlation_pearsonr2(data0, data1)
if math.isnan(correlation):
return float('NaN')
if correlation == 1.:
return 0.
n = len(data0)
assert n == len(data1)
# Compute observed t statistic.
t = correlation * math.sqrt((n - 2)/(1 - correlation**2))
# Compute p-value for two-sided t-test.
return 2 * stats.t_cdf(-abs(t), n - 2)
The function correlation_pearsonr2 is returning the square of the pearsonr2 (verified this against the scipy pearsonr2, which agrees with the bayeslite pearsonr2), but then in correlation_p_pearsonr2 the square is getting squared again! So now the correlation value is to the 4th power. I believe this is incorrect, I think the formula from wikipedia was expecting an unsquared correlation coefficent.
This would also fix the fact that I found the scipy version of pearsonr to have the same correlation value but wildly different p-values. Of course I might just have misunderstood what the code is doing.
From https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
t = r\sqrt{\frac{n-2}{1 - r^2}}
from the correlation_p_pearsonr2:
The function correlation_pearsonr2 is returning the square of the pearsonr2 (verified this against the scipy pearsonr2, which agrees with the bayeslite pearsonr2), but then in correlation_p_pearsonr2 the square is getting squared again! So now the correlation value is to the 4th power. I believe this is incorrect, I think the formula from wikipedia was expecting an unsquared correlation coefficent.
This would also fix the fact that I found the scipy version of pearsonr to have the same correlation value but wildly different p-values. Of course I might just have misunderstood what the code is doing.