Check for class-imbalance and it's impact

googleinterns / amaranth

Apache License 2.0

2 stars 0 forks source link

Check for class-imbalance and it's impact #15

Closed tommylau-exe closed 4 years ago

tommylau-exe commented 4 years ago

This PR contains code that was used to calculate how balanced the calorie labels (classes) are in the data set with our current thresholds. The results are as follows:

Low-calorie: 25%
Average-calorie: 63%
High-calorie: 12%

tommylau-exe commented 4 years ago

For posterity, the current F1 Score for each category (as calculated by the recently pushed code) is roughly as follows:

Low-calorie: 0.81
Average-calorie: 0.90
High-calorie: 0.65

Seems to correlate pretty well with representation in the data set.

tommylau-exe commented 4 years ago

In addition, the confusion matrix generated by the recently pushed code looks roughly like the following:

	Predicted Low-Calorie	Predicted Average-Calorie	Predicted High-Calorie
Actual Low-Calorie	10765	2437	52
Actual Average-Calorie	2065	32897	979
Actual High-Calorie	74	2100	2977