qinenergy / corda

[ICCV 2021] Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation
71 stars 12 forks source link

gta2city #6

Closed xiaoachen98 closed 3 years ago

xiaoachen98 commented 3 years ago

When I revisited the performance of your GTA2City, I found that the MIOU could only reach about 54.8 after 250,000 iterations. I didn't change anything except the 10.2 version of CUDA. Could you please provide the training log of your GTA2City? Thanks a lot!!

qinenergy commented 3 years ago

To reproduce our results, please follow the instructions in ReadMe and create the same environment. Could you maybe use the instructions provided in ReadMe to rerun your training again?

conda update conda
conda env create -f environment.yml
conda activate corda 

Our reported results were trained and evaluated on an AWS p3.2xlarge (V100). Here is one example training log for GTA2Cityscape. Thanks for your interest on our work.

tudragon154203 commented 3 years ago

Hi @qinenergy , I also get worse GTA2City validation result than what the paper reported. My result: 51.1 mIoU at 82k steps. At your training log it goes around 53-54 mIoU. Here's what I suppose to be the problem:

  1. Your config['evaluation']['label_size'] is set to [512,1024], whereas I believe it's common practice to use full size Cityscapes images & labels during evaluation [1024,2048]. So I've changed that in my config and got the above eval_mIoU.
  2. The evaluateUDA.py code is different than other mIoU calculating code. For example: IAST https://github.com/Raykoooo/IAST/blob/24db9912403a01faf34884d60c9778cd48651bde/code/sseg/datasets/metrics/miou.py and ProDA https://github.com/microsoft/ProDA/blob/main/metrics.py both write that code snippet in another way. May this difference bring unreliable result?
qinenergy commented 3 years ago

@tudragon154203 Thanks for your interest in our work!

  1. The intermediate performance during the training can vary because the model is still training. Please check the final training performance at 250k iteration for comparison. Could you revise your statement because we did not report any performance at iteration 82k in the paper. May I also ask if you have used the environment provided in the repository to train the model? Could you maybe keep the same code/config and run the training code for 250k to reproduce our results?

  2. Our reported result in the paper is based on the evaluation script (shells/eval_gta2city.sh) using resolution [1024, 2048], this is set by the --full-resolution flag in the shells/eval_gta2city.sh. The config file is overwritten by this flag. This follows common practice.

    python3 evaluateUDA.py --full-resolution -m deeplabv2_gta --model-path ./checkpoint/gta/checkpoint-iter250000.pth
  3. We use the evaluation code from DACS, which is the main paper we took for comparison in our ablations. The same evaluation code has also been used by some other popular papers such as ClassMix.

  4. The final performance at 250k can vary a bit because of the randomness, we observe roughly 55 - 58 mIoU in our multiple runs. This is also why we did not report the best model we saved (57.7) and instead reported 56.6 mIoU in the paper.

Here are more training logs for GTA2Cityscapes in different runs which maybe useful for you. One achieves 56.8 mIoU in the end and one run achieves 56.3, another run achieves 57.2. Internally, we trained this version of CorDA for more than 5 times and the performance is constantly around the reported number (56.6) in the paper.

tudragon154203 commented 3 years ago

Thanks for your response. I would update the result upon training completion later.

xiaoachen98 commented 3 years ago

I want to know what video data do you use when you use Mono depthV2 to get depth information of Cityscapes and GTA5? image

lhoyer commented 3 years ago

We have used leftImg8bit_sequence_trainvaltest.zip for Cityscapes and https://playing-for-benchmarks.org/download/ for GTA. Please, refer to #5 for more details.