googleinterns / amaranth

Apache License 2.0
2 stars 0 forks source link

Check for class-imbalance and it's impact #15

Closed tommylau-exe closed 4 years ago

tommylau-exe commented 4 years ago

This PR contains code that was used to calculate how balanced the calorie labels (classes) are in the data set with our current thresholds. The results are as follows:

tommylau-exe commented 4 years ago

For posterity, the current F1 Score for each category (as calculated by the recently pushed code) is roughly as follows:

Seems to correlate pretty well with representation in the data set.

tommylau-exe commented 4 years ago

In addition, the confusion matrix generated by the recently pushed code looks roughly like the following:

Predicted Low-Calorie Predicted Average-Calorie Predicted High-Calorie
Actual Low-Calorie 10765 2437 52
Actual Average-Calorie 2065 32897 979
Actual High-Calorie 74 2100 2977