calculates the entropy of the frequency distribution of the entities/taxonomy category and calculates a chi square test statistic that compares the frequency to a uniform distribution. I think these metrics are more relevant than skewness because we aren't expecting the frequency distributions to be normal? In theory I think we want to compare to uniform distribution (i.e. all categories have the same number of entities)
Fixes #41
Checklist:
[X] I have refactored my code out from notebooks/
[X] I have checked the code runs
[X] I have tested the code
[ ] I have run pre-commit and addressed any issues not automatically fixed
[X] I have merged any new changes from dev
[ ] I have documented the code
[X] Major functions have docstrings
[ ] Appropriate information has been added to READMEs
Description
calculates the entropy of the frequency distribution of the entities/taxonomy category and calculates a chi square test statistic that compares the frequency to a uniform distribution. I think these metrics are more relevant than skewness because we aren't expecting the frequency distributions to be normal? In theory I think we want to compare to uniform distribution (i.e. all categories have the same number of entities)
Fixes #41
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s