Whether the pre-trained model is necessary？

yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

https://arxiv.org/pdf/2111.12903.pdf

MIT License

187 stars 17 forks source link

Whether the pre-trained model is necessary？ #25

Closed Ruirui-Huang closed 1 year ago

Ruirui-Huang commented 1 year ago

Thank you very much for your work. I chose not to load the pre-trained model, and when using 1323 VOC12 labeled examples, I found that moiu was only 0.5241 highest. Also, I would like to ask why you did not choose SOTA supervised semantic segmentation models as backbone architectures?

yyliu01 commented 1 year ago

Hi @Ruirui-Huang,

Sorry, I might not fully understand your question. Please let me know if you are not satisfied with the answers below.

1). The image-net pre-trained model we used following CPS and is introduced in this page (checkpoint section). Such a pre-trained checkpoint is necessary for all the segmentation methods, as the pre-knowledge within the backbone is essential for optimal convergence.

2). We don't have enough space in the paper to prepare SOTA architecture studies, because we measure the extra PSPNet experiments for the peer methods (DARs). I believe a fair comparison must be held based on consistent architectures (and backbones). The supervised methods architectures that reach better mIoU can boost the semi-supervised performance simultaneously, reducing such experiments' importance.

regards, yuyuan

Ruirui-Huang commented 1 year ago

I got it. Thanks you very much! In addition, if I want to change the backbone architectures, do I need to train an additional pre-trained model based on the ImageNet?

kevinshieh0225 commented 1 year ago

Thanks for the great work! I have a relative question about the pre-trained model, too. I found that the pre-trained model initial is only applied to the student encoder, but not to the two teacher encoders. Is there any consideration for the design?

yyliu01 commented 1 year ago

Hi @Ruirui-Huang No, I think all the backbones for the segmentation should have an imagenet pre-trained model. @kevinshieh0225 It doesn't impact a lot. The teacher networks' parameters will eventually be dominated by the student's parameters based on the EMA transfer. Intuitively, the random initliased teacher networks might own better divergence.

Ruirui-Huang commented 1 year ago

Thanks for your reply!

yyliu01 commented 1 year ago

Very welcome!