wireservice / agate

A Python data analysis library that is optimized for humans instead of machines.
https://agate.readthedocs.io
MIT License
1.17k stars 154 forks source link

Cookbook: aggregations.BenfordsLaw #105

Closed onyxfish closed 9 years ago

onyxfish commented 10 years ago

http://latimes-calculate.readthedocs.org/en/latest/basicfunctions.html#benford-s-law

mickaobrien commented 10 years ago

Were you thinking of something like the latimes-calculate implementation? i.e. returning a Pearson coefficient value for the correlation between the actual first digits and those predicted by Benford's Law.

onyxfish commented 10 years ago

That seems right. I've never had the opportunity to read up on Benford's Law before, so I'm not sure if there are variations on how it's implemented, but as a general rule of thumb doing what the LAT crew does hasn't led me astray. As a piece of this you could easily factor pearson_correlation more generally as I did with _median.

mickaobrien commented 10 years ago

Yeah, splitting out the pearson_correlation makes sense.

I'll start work on this so.

mickaobrien commented 10 years ago

I've been looking into this a bit and it seems like positive and negative numbers should be generally be treated separately when testing for Benford's law (http://www.bcasonline.org/articles/artin.asp?810).

In this R implementation (PDF), they pass a sign parameter that allows you to specify whether you analyse numbers that are positive, negative or both. I was planning on doing the same, defaulting to positive. What do you think?

onyxfish commented 9 years ago

Migrated to agate-stats.