hpcc-systems / DataPatterns

HPCC Systems ECL bundle that provides some basic data profiling and research tools to an ECL programmer
3 stars 4 forks source link

Add function for Benford's Law test on a column #66

Closed JamesDeFabia closed 4 years ago

JamesDeFabia commented 4 years ago

Benford’s law can be useful to detect fraud and data errors because it’s expected that certain large sets of numbers will follow the law. I think it would be a great new feature for the Data Patterns library. It would be nice to have a function that could take a column of a data set and detect if it passes the Benford Test.

Optionally, It would also be nice to make it possible to test specified columns for Benford distribution in ECL Watch when analyzing data files. This should be optional because it is not applicable to all data columns (or even all numeric data).

dcamper commented 4 years ago

Additional info:

https://www.statisticshowto.com/benfords-law/ https://en.wikipedia.org/wiki/Benford%27s_law

dcamper commented 4 years ago

Added to candidate-1.6.6.