target / huntlib

A Python library to help with some common threat hunting data analysis operations
MIT License
138 stars 22 forks source link

Benfords returns chi2 higher than 1.0 #16

Closed CMiksche closed 4 years ago

CMiksche commented 4 years ago

Describe the bug After reading the documentation i expect benfords to return a chi2 value between 0.0 and 1.0 but when testing with high numbers i get higher chi2 values:

To Reproduce Download the current version from PyPI and test the following input:

huntlib.util.benfords([234,317,211,92])
huntlib.util.benfords([235634643])
huntlib.util.benfords([123])
huntlib.util.benfords([9,9,9])

Expected behavior Maximum chi2 value of 1

Terminal output

>>> huntlib.util.benfords([234,317,211,92])
(2.279150197628459, 0.9712356424435329, 1    0.00
2    0.50
3    0.25
4    0.00
5    0.00
6    0.00
7    0.00
8    0.00
9    0.25
Name: digits, dtype: float64)
>>> huntlib.util.benfords([235634643])
(4.681818181818183, 0.7909792203003781, 1    0.0
2    1.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    0.0
Name: digits, dtype: float64)
>>> huntlib.util.benfords([123])
(2.3222591362126246, 0.9695046717201476, 1    1.0
2    0.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    0.0
Name: digits, dtype: float64)
>>> huntlib.util.benfords([9,9,9])
(20.73913043478261, 0.007873655429909338, 1    0.0
2    0.0
3    0.0
4    0.0
5    0.0
6    0.0
7    0.0
8    0.0
9    1.0
Name: digits, dtype: float64)

I know - this issue probably won't occur on bigger and more realistic datasets but either the documentation or the handling of these cases should be changed.

DavidJBianco commented 4 years ago

Thanks for catching this. The chi2 range is in fact 0 to infinity, so I corrected the documentation.