Closed Ruirui-Huang closed 1 year ago
Hi @Ruirui-Huang,
Sorry, I might not fully understand your question. Please let me know if you are not satisfied with the answers below.
1). The image-net pre-trained model we used following CPS and is introduced in this page (checkpoint section). Such a pre-trained checkpoint is necessary for all the segmentation methods, as the pre-knowledge within the backbone is essential for optimal convergence.
2). We don't have enough space in the paper to prepare SOTA architecture studies, because we measure the extra PSPNet experiments for the peer methods (DARs). I believe a fair comparison must be held based on consistent architectures (and backbones). The supervised methods architectures that reach better mIoU can boost the semi-supervised performance simultaneously, reducing such experiments' importance.
regards, yuyuan
I got it. Thanks you very much! In addition, if I want to change the backbone architectures, do I need to train an additional pre-trained model based on the ImageNet?
Thanks for the great work! I have a relative question about the pre-trained model, too. I found that the pre-trained model initial is only applied to the student encoder, but not to the two teacher encoders. Is there any consideration for the design?
Hi @Ruirui-Huang No, I think all the backbones for the segmentation should have an imagenet pre-trained model. @kevinshieh0225 It doesn't impact a lot. The teacher networks' parameters will eventually be dominated by the student's parameters based on the EMA transfer. Intuitively, the random initliased teacher networks might own better divergence.
Thanks for your reply!
Very welcome!
Thank you very much for your work. I chose not to load the pre-trained model, and when using 1323 VOC12 labeled examples, I found that moiu was only 0.5241 highest. Also, I would like to ask why you did not choose SOTA supervised semantic segmentation models as backbone architectures?