microsoft / ProDA

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)
https://arxiv.org/abs/2101.10979
MIT License
286 stars 44 forks source link

Is warming-up a critical part in the full model performance #29

Closed lyxok1 closed 3 years ago

lyxok1 commented 3 years ago

Hi, thanks for sharing the code. As you mentioned in the README.md, a warm-up model is used to start up the 3-stage training process, which seems a pretraining process with adversarial training according to your code. However, this part is not discussed a lot in the paper. Since the warm-up model is used to initialize the Basemodel in stage1, and from the training instruction, each stage is highly relying on the trained model from its previous stage (either to initialize the Basemodel or initialze a Basemodel_ema), I wonder if I change the DA warm-up model to a regular source-only model for startup, will there be a severe chain-effect in downstream stage ?

panzhang0104 commented 3 years ago

Yeah, warming-up is a critical part of the final performance, If we do not use warm-up, the final results will drop 2-3 miou