liyunsheng13 / BDL

MIT License
222 stars 30 forks source link

How to evaluate the provided pre-trained models to get the same results with the paper #27

Closed wj-zhang closed 4 years ago

wj-zhang commented 4 years ago

Hi, I am sorry to disturb you again. I was trying to evaluate the pre-trained models which are provided by this project but I met some difficulties. Can you give me some suggestions? Thanks in advance!

You provided the pre-trained models in the README file, they are GTA5_deeplab, GTA5_VGG, SYNTHIA_deeplab, and SYNTHIA_VGG. In my understanding, I can get the same results with the paper by running evaluation.py on the test dataset. However, the results I got are as follows

  1. GTA5→Cityscapes, DeepLab (48.5 in paper), the result is exactly the same as the paper

    python evaluation.py --restore-from gta_2_city_deeplab --model DeepLab --save test
    ===> mIoU19: 48.52
  2. GTA5→Cityscapes, VGG (41.3 in paper), the result is slightly lower than the paper

    python evaluation.py --restore-from gta_2_city_vgg --model VGG --save test
    ===> mIoU19: 41.06
  3. SYNTHIA→Cityscapes, DeepLab (51.4 in paper), the result is slightly lower than the paper

    python evaluation.py --restore-from syn_2_city_deeplab --model DeepLab --save test
    ===> mIoU13: 51.32
  4. SYNTHIA→Cityscapes, VGG (39.0 in paper), the result is slightly lower than the paper

    python evaluation.py --restore-from syn_2_city_vgg --model VGG --save test
    ===> mIoU16: 38.81
liyunsheng13 commented 4 years ago

Does it really matter of ~0.1 difference?

wj-zhang commented 4 years ago

Does it really matter of ~0.1 difference?

I just want to confirm that the released pre-trained models are correct and the evaluation I did is right. Plus, can I report the test results I got based on the released models in my manuscript?

liyunsheng13 commented 4 years ago

I think the model I upload is correct. As i repeat the same experiments for several times, the one I upload to github might be a light different to the one I use to report in the paper. I still suggest you to use the results in the paper if there is only ~0.1 difference. Oh, btw, the results can also be influenced by order of softmax and upsample in the evaluation code. You can make a simple try by change the order of them. I'm sure it will give you a slightly different result.

wj-zhang commented 4 years ago

I think the model I upload is correct. As i repeat the same experiments for several times, the one I upload to github might be a light different to the one I use to report in the paper. I still suggest you to use the results in the paper if there is only ~0.1 difference. Oh, btw, the results can also be influenced by order of softmax and upsample in the evaluation code. You can make a simple try by change the order of them. I'm sure it will give you a slightly different result.

Got it. Thank you so much for your reply and suggestions!