NIPS 2020 | Uncertainty-aware Self-training for Text Classification with Few Labels

This paper investigates the self-training method for the student-teacher data augmentation framework (label un-labeled data with teacher network, and then training student network with this pseudo-data).

The key problem is how to make the student network be aware of the pseudo-label's uncertainty, i.e. the student network should treat different pseudo-data differently in terms of their different uncertainty.

Hence, this paper proposes how to define uncertainty, how to sample data according to the uncertainty to train the student network, and how to train the uncertainty-aware student model (confident learning).

1. uncertainty

They use Bayesian Active Learning by Disagreement (BALD) to measure the entropy of teacher's ouputs and the agreement of outputs with different dropout parameters. The larger B is, the more uncertain the data is.

2. selection

There are two strategies that can be used: easier (select more data with lower uncertainty) and harder (select more data with higher uncertainty).

3. confident learning

The above selection and uncertainty definition mainly focus on the mean of prediction. Here, the variance of predictions is incorporated into the final loss function. The following formula is wrong and should add a '-'.

richardbaihe / paperreading