microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)
https://arxiv.org/abs/2101.10979
MIT License
286 stars 44 forks source link

Why the student is initialized with self-supervised weights rather than supervised weights #34

Closed kailigo closed 3 years ago

kailigo commented 3 years ago

Great work and thank you for sharing the code.

I noticed that in the distillation stage, you initialize your student model with SSL weights learned by SimCLRv2. I am wondering why not use the fully-supervised image weight. After all, the fully-supervised model is better than the SSL one. I guess this is not because you want to avoid using ImageNet labels, as you initialize your DeepLab model with fully-supervised weights in the first stage. The question is why not use the fully-supervised weights for the distillation as well.

Thanks.

zhangmozhe commented 3 years ago

Hi, thanks for this question. The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" discovers that self-supervised models, like SimCLR, are good at few-shot learner, which can achieve even better quality than the supervised model using just 1% labeled data. For our task, the pseudo labels always contain noises even after label denoising, so we hope to use an even harder threshold to select a few confident labels, and use these sparse yet cleaner labels to train the model. Therefore, we initialize the student with self-supervised weights, as they are much less data-hungry and can fast learn knowledge using a few labels.