jackscott / python-statlib

Exported from google.code, needed to get this crusty code working in a modern world
https://archive.org/web/
Other
0 stars 0 forks source link

stats.kendalltau doesn't match scipy.stats.kendalltau or R cor function with method="kendalltau" #18

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

$pip install statlib #currently installs v1.1
$python
>>> from statlib.stats import kendalltau
>>> kendalltau([1,2,3,4,5,6,7,8], [2,1,6,8,5,3,7,4])[0]
0.19166296949998196

What is the expected output? What do you see instead?

It seems reasonable to match other libraries (especially R):

scipy.stats > kendalltau([1,2,3,4,5,6,7,8], [2,1,6,8,5,3,7,4])[0]
0.214285714286
R > cor(c(1,2,3,4,5,6,7,8), c(2,1,6,8,5,3,7,4), method="kendall")
0.214286

What version of the product are you using? On what operating system?
statlib 1.1 with python 2.7 on Mac OS 10.8

Please provide any additional information below.

I notice that the code references Numerical Recipes after checking NR, I think 
the problem is that the python version's inner loop introduces a tie by 
accident:

currently the inner loop is:

   for k in range(j,len(y)):

but it seems this causes a mistaken tie since j will equal k on the first 
iteration and thus trigger the tie logic:

this second form matches the Numerical Recipes code more closely and a quick 
test shows that it now matches scipy and R. 

   for k in range(j + 1,len(y)):

Not sure if this link will load, but it goes to a version of NR:

http://books.google.com/books?id=3-BfpBw7AqQC&pg=PA326&lpg=PA326&dq=function+ken
dl1+in+Numerical+Recipes&source=bl&ots=hzuUtoMff7&sig=AKcKDiKTm-bDG0l_ORk6X5Gcym
c&hl=en&sa=X&ei=A8NKUe_HFqmciQKF7YDQDg&ved=0CDAQ6AEwAA#v=onepage&q=function%20ke
ndl1%20in%20Numerical%20Recipes&f=false

Original issue reported on code.google.com by ja...@reyagroup.com on 21 Mar 2013 at 8:41