Question: Best way to handle class imbalance for classification

tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Apache License 2.0

15.54k stars 3.5k forks source link

Question: Best way to handle class imbalance for classification #989

Open olix20 opened 6 years ago

olix20 commented 6 years ago

I'm working on a text classification problem similar to the sentiment_imdb problem, with 4 classes. My training set is highly imbalanced and I'd like to use balanced class weights for training. What's the best way to do that?

DonPex commented 6 years ago

In general, if you have an imbalanced dataset you can do two things:

if you think that real data could be distributed in a similar way to the training set, you should not alter this one;
instead if you want to give equal importance to every class, you can reduce or augment your training dataset (see SMOTE technique), so at the end you have every class equally distributed in your dataset.