NIPS 2020 | Rethinking the Value of Labels for Improving Class-Imbalanced Learning

Below are notes from here.

Do we need labels when existing labels are class imbalanced (some classes have more labeled examples than others) and we have a lot of unlabeled data?

Positive. Yes, we need labels. Self-train on the unlabeled data and you would be golden. (Self-training is a process where an intermediate model, which is trained on human-labeled data, is used to create ‘labels’ (thus, pseudo labels) and then the final model is trained on both human-labeled and intermediate model labeled data).

Negative. We may do away with the labels. One can use self-supervised pretraining on all the data available to learn meaningful representations and then learn the actual classification task. It is shown that this approach improves performance.

Takeaway: If you have class-imbalanced labels and more unlabeled data, do self-training or self-supervised pretraining. (It is shown that self-training beats self-supervised learning on CIFAR-10-LT though).

Keywords: imbalanced classification,

What is

[ ] self-training
[ ] self-supervised learning

joapolarbear / dl_notes

NIPS 2020 | Rethinking the Value of Labels for Improving Class-Imbalanced Learning #30