This paper investigates the self-training method for the student-teacher data augmentation framework (label un-labeled data with teacher network, and then training student network with this pseudo-data).
The key problem is how to make the student network be aware of the pseudo-label's uncertainty, i.e. the student network should treat different pseudo-data differently in terms of their different uncertainty.
Hence, this paper proposes how to define uncertainty, how to sample data according to the uncertainty to train the student network, and how to train the uncertainty-aware student model (confident learning).
1. uncertainty
They use Bayesian Active Learning by Disagreement (BALD) to measure the entropy of teacher's ouputs and the agreement of outputs with different dropout parameters. The larger B is, the more uncertain the data is.
2. selection
There are two strategies that can be used:
easier (select more data with lower uncertainty) and harder (select more data with higher uncertainty).
3. confident learning
The above selection and uncertainty definition mainly focus on the mean of prediction. Here, the variance of predictions is incorporated into the final loss function. The following formula is wrong and should add a '-'.
This paper investigates the self-training method for the student-teacher data augmentation framework (label un-labeled data with teacher network, and then training student network with this pseudo-data).
The key problem is how to make the student network be aware of the pseudo-label's
uncertainty
, i.e. the student network should treat different pseudo-data differently in terms of their different uncertainty.Hence, this paper proposes how to define uncertainty, how to sample data according to the uncertainty to train the student network, and how to train the uncertainty-aware student model (confident learning).
1. uncertainty
They use Bayesian Active Learning by Disagreement (BALD) to measure the entropy of teacher's ouputs and the agreement of outputs with different dropout parameters. The larger B is, the more uncertain the data is.
2. selection
There are two strategies that can be used: easier (select more data with lower uncertainty) and harder (select more data with higher uncertainty).
3. confident learning
The above selection and uncertainty definition mainly focus on the mean of prediction. Here, the variance of predictions is incorporated into the final loss function. The following formula is wrong and should add a '-'.
4. final results