jackscott / python-statlib

Exported from google.code, needed to get this crusty code working in a modern world
https://archive.org/web/
Other
0 stars 0 forks source link

lpercentileofscore and lscoreatpercentile #15

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
stats.lpercentileofscore() and stats.lscoreatpercentile() are totally broken, 
the results are absurd : lpercentileofscore() accept values a little above the 
max, and scoreatpercentile() returns inexistent values (like negative values 
for low percentile like 10).

A better, and working, implementation of lscoreatpercentile can be found here :
http://www.goldb.org/corestats.html

Im trying to make a lpercentileofscore() from the link above.

Im on Windows 7 using Python v2.6

Note: BTW, very cool project, I used the original stats.py and pstats.py before 
I found your project homepage, and I must say that it's great to see someone 
taking up on these great small and portable librairies (it avoids the need to 
package the massive Numpy in every package with very small math needs).

Original issue reported on code.google.com by grosb...@gmail.com on 9 Sep 2010 at 2:06

GoogleCodeExporter commented 8 years ago
For a few weeks I won't have time to look at this, but usually implementations 
like the one you mention above (that I looked at) are overly simplistic.

The concept of even a median is a more complicated that one naively assumes. 
Incorrect results usually mean that the input data is such that this 
statistical measure may not make sense in the first place. For example taking 
the 10% of say 3 elements etc.  That is what I suspect your problem is in the 
first place. 

In statlib you will get incorrect results if the original data is such that the 
concept that you want does not make sense. 

Original comment by istvan.a...@gmail.com on 9 Sep 2010 at 2:17

GoogleCodeExporter commented 8 years ago
Thank you for your quick reply sir,

I know that this implementation is overly simplistic and not optimized, that's 
why I don't use any other function than the 2 ones that are broken in stats.py.

About my usage of the functions, even if I'm not a maths expert, I studied most 
of the stats methods and technics that are covered in stats.py, and I use only 
the ones I understand and I know of (at first, I was implementing these 
functions myself, and then I stumbled upon this library and it fitted the job 
quite better than my functions, except for some like the mode).

I thought about the possibility that my original data does not fit this 
particular function, but normally it should : I have a simply list (one 
dimensional matrix) of numbers, and I just want to know the percentile for the 
number of occurencies of a certain value, and inversely.

It is to be noted that all the other functions that I tested in stats.py works 
quite well, only these 2 ones return some very weird results.

Here is attached my (basic and simple) version of percentile from a value.

Original comment by grosb...@gmail.com on 9 Sep 2010 at 4:58

Attachments:

GoogleCodeExporter commented 8 years ago
This error in the code is coming from the 
'score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)'
line, where cumhist[i-1] wraps to select the last bin when the score falls in 
the first bin.

Original comment by Developm...@JivanAmara.net on 29 Sep 2010 at 3:25

GoogleCodeExporter commented 8 years ago
It is obviously broken, you should really fix it.

Original comment by vaisvi...@gmail.com on 15 Feb 2011 at 12:59