Description

calculates the entropy of the frequency distribution of the entities/taxonomy category and calculates a chi square test statistic that compares the frequency to a uniform distribution. I think these metrics are more relevant than skewness because we aren't expecting the frequency distributions to be normal? In theory I think we want to compare to uniform distribution (i.e. all categories have the same number of entities)

Fixes #41

Checklist:

[X] I have refactored my code out from notebooks/
[X] I have checked the code runs
[X] I have tested the code
[ ] I have run pre-commit and addressed any issues not automatically fixed
[X] I have merged any new changes from dev
[ ] I have documented the code
- [X] Major functions have docstrings
- [ ] Appropriate information has been added to READMEs
[X] I have explained this PR above
[X] I have requested a code review

nestauk / dap_aria_mapping

41 skewness #49

Description

Checklist: