IoU on cityscapes test set cannot achieve your result reported in the paper

Euphoria16 commented 4 years ago

Your work is great and thanks for sharing the code.

However,when I use your pre-trained model for cityscapes test set and submit to the website, the average IoU is 79.7%,which is lower than your result 82.8%.

Are there any other special techniques or settings you have adopted for testing?

Looking forward to your reply.

shoutOutYangJie commented 4 years ago

Did you test pretrained model using multi-scale?

Euphoria16 commented 4 years ago

Thanks for answering my question!

But how to merge the multi-scale testing results? It seems the scripts don't include this. After scaling,the outputs are also of different scales. Shall I resize them to the original size and then average them?

Thanks again.

shoutOutYangJie commented 4 years ago

yes， exactly right

------------------ 原始邮件 ------------------ 发件人: "Euphoria16"notifications@github.com; 发送时间: 2019年9月25日(星期三) 下午5:11 收件人: "nv-tlabs/GSCNN"GSCNN@noreply.github.com; 抄送: "312358434"312358434@qq.com;"Comment"comment@noreply.github.com; 主题: Re: [nv-tlabs/GSCNN] IoU on cityscapes test set cannot achieve yourresult reported in the paper (#23)

Thanks for answering my question!

But how to merge the multi-scale testing results? It seems the scripts don't include this. After scaling,the outputs are also of different scales. Shall I resize them to the original size and then average them?

Thanks again.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Euphoria16 commented 4 years ago

Great! Thank you very much.

Euphoria16 commented 4 years ago

Sorry to bother again. I tried and the accuracy achieved 80.00%,still cannot achieve 82.8%. I think my method may not be correct. Did you mean resizing and averaging outputs of different scales? But the outputs are ids of classes, for example, for scales 0.5,1.0,2.0,suppose that the outputs of one specific pixel are 6,6,11,respectively(probably representing car,car,person),but averaging them will lead to an irrelevant class. That's the problem.

tovacinni commented 4 years ago

You should be averaging the output before argmax-ing, if that is what you are doing. We do have a custom inference code, but we will likely not going to be releasing out custom inference code.

Another thing to note also is that we forgot to include this in our original arXiv submission (but is included in our ICCV submission... we will update the arXiv version soon), but the test set model is pre-trained using the Mapillary dataset, giving us another bit of boost with respect to our validation model. We also finetune our test set model a bit more too.

Euphoria16 commented 4 years ago

Thanks for your reply! I understand.

I believe using Mapillary dataset for pre-training would be very beneficial because of the domain similarity. But have you tried not using Mapillary data? How much can mean IoU achieve if not pretrained on Mapillary dataset?

tovacinni commented 4 years ago

We haven't done much testing without Mapillary on the test set since submission is limited to the test set.

On the validation set however, we report the performance without Mapillary in the paper.

yoon307 commented 4 years ago

Thank you for the nice work. I also meet the similar problem. I used multi scale inference (with interpolation) and the mIoU on test set is 80.4%. I used the pretrained model checkpoint that is provided in github. Are the pretrained model(checkpoint) for test and architecture different? (Should i change some setting or code to switch regular stream from resnet 101 to wideresnet?)

nv-tlabs / GSCNN

IoU on cityscapes test set cannot achieve your result reported in the paper #23