Why the student is initialized with self-supervised weights rather than supervised weights

microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

MIT License

286 stars 44 forks source link

Hi, thanks for this question. The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" discovers that self-supervised models, like SimCLR, are good at few-shot learner, which can achieve even better quality than the supervised model using just 1% labeled data. For our task, the pseudo labels always contain noises even after label denoising, so we hope to use an even harder threshold to select a few confident labels, and use these sparse yet cleaner labels to train the model. Therefore, we initialize the student with self-supervised weights, as they are much less data-hungry and can fast learn knowledge using a few labels.

microsoft / ProDA

Why the student is initialized with self-supervised weights rather than supervised weights #34