Closed kailigo closed 3 years ago
Hi, thanks for this question. The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" discovers that self-supervised models, like SimCLR, are good at few-shot learner, which can achieve even better quality than the supervised model using just 1% labeled data. For our task, the pseudo labels always contain noises even after label denoising, so we hope to use an even harder threshold to select a few confident labels, and use these sparse yet cleaner labels to train the model. Therefore, we initialize the student with self-supervised weights, as they are much less data-hungry and can fast learn knowledge using a few labels.
Great work and thank you for sharing the code.
I noticed that in the distillation stage, you initialize your student model with SSL weights learned by SimCLRv2. I am wondering why not use the fully-supervised image weight. After all, the fully-supervised model is better than the SSL one. I guess this is not because you want to avoid using ImageNet labels, as you initialize your DeepLab model with fully-supervised weights in the first stage. The question is why not use the fully-supervised weights for the distillation as well.
Thanks.