yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
https://arxiv.org/pdf/2111.12903.pdf
MIT License
186 stars 17 forks source link

Results of supervised baseline #2

Closed LiheYoung closed 2 years ago

LiheYoung commented 2 years ago

Hi,

Did you apply any extra techniques to your supervised baseline, such as setting output stride as 8, auxiliary loss, or OHEM? Since your reported baseline results are very high on the Pascal dataset, according to Figure 3.

yyliu01 commented 2 years ago

No, we don't apply any tricks for the supervised training, except the deep-stem blocks and SynBN, which follow the CPS approach. We also maintain the same training iterations between sup/semi settings for the labelled data.

Given that we utilize the teacher network to perform inference for all the settings, one assumption is that the self-ensemble teacher is likely to boost the baseline for even supervised situations.

The code for VOC12 will be released very soon, and we appreciate your interest.

LiheYoung commented 2 years ago

OK, thanks. I notice that according to Figure 3, your supervised baseline results on the Pascal 1/8 setting are ~71 (RN-50) and ~74 (RN-101), while the corresponding results from CPS are 69.43 (RN-50) and 72.21 (RN-101). This indicates the EMA teacher may boost the student by ~1.5%.

I wonder whether the EMA teacher is still superior to the online student by ~1.5% in the final test of semi-supervised setting.

Besides, a little strangely, your Cityscapes supervised baseline results do not show an improvement over CPS.

yyliu01 commented 2 years ago

No, the teachers and student will eventually fall on the similar local minima for the semi-supervised experiments. I believe the mIoU gap should be less than 1.5% for all the partition protocols. By checking our data loader files with CPS (https://github.com/charlesCXK/TorchSemiSeg/blob/f67b37362ad019570fe48c5884187ea85f2cc045/furnace/datasets/BaseDataset.py), I found that we additionally utilize input perturbations (e.g., colour jittering, gaussian blur, and so on). Maybe the various input augmentation also brings some improvement for supervised training?

I apologize for your confusion about the Cityscapes setting. The supervised baseline graph for Cityscapes is based on the CE loss, with a smaller resolution (with 712×712), as shown in section 4.1. The different settings lead to lower results than CPS (with 800x800 and OHEM), but we maintain fair competition with other previous works (e.g., CAC, ECS). We also compare CPS with their setting (as shown in Tab. 2) to prove the effectiveness.

LiheYoung commented 2 years ago

Okay, I got it. Thanks.