utterances-bot commented 1 year ago

Matt Kaye - Balancing Classes in Classification Problems

Matt Kaye’s personal website.

https://matthewrkaye.com/posts/2023-03-25-balancing-classes/balancing-classes.html

ercbk commented 1 year ago

When you perform an imbalance correction for your data, you have to recalibrate the predicted probabilities of your model. This is typically done through Platt Scaling or Isotonic Regression. The imbalance correction + recalibration may not be necessary for logistic regression in the first place, but in many cases, it should be beneficial for tree/boosted/SVC/DL models in terms of predictive performance.

You might also be interested in the Spiegelhalter z statistic for testing the calibration of models and ICI, E50, E90, and Emax metrics for quantifying the amount of miscalibration.

mrkaye97 commented 1 year ago

@ercbk that's a great point and I agree! Platt scaling especially is a thing we do regularly to recalibrate our probabilities ex-post, and these methods are a very effective way of getting probability predictions that are closer to the ground truth.

But they have their own pitfalls. One of which is that they aren't nearly as well-known as class imbalance "problems" are, so very often I'll see people balancing classes and paying no attention to how the calibration of their predictions changes as a result.

But regardless, strongly agree with your point -- using something like a tree booster + SMOTE + a method of re-calibrating ex-post can yield great results!

I'll add an addendum to the post about how this can also be a perfectly valid use of class rebalancing.

mrkaye97 / mrkaye97.github.io

posts/2023-03-25-balancing-classes/balancing-classes #13

Matt Kaye - Balancing Classes in Classification Problems