xh-liu / CC-FPSE

Code for NeurIPS 2019 paper "Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis"
128 stars 15 forks source link

Questions regarding mIoU and accuracy #4

Open justin-hpcnt opened 4 years ago

justin-hpcnt commented 4 years ago

Hi,

Thank you sharing the code and replying my previous question! While reproducing the metrics, I have some questions:

  1. I'm referring SPADE issue to implement evaluation code. Did you use same repo and pre-trained weight for evaluation?

  2. If so, in regards to the COCO-Stuff dataset, original deeplab v2 shows 66.8 pixel accuracy and 39.1 mIoU score for ground truth validation images. However, CC-FPSE reaches 70.7 pixel accuracy and 41.6 mIoU score, which seems weird. I think the difference might come from the different input size to the deeplab model. How did you feed inputs to the deeplab network? (for example, use 256x256 image or upsampling 256x256 image to 321x321 with bilinear interpolation)

xh-liu commented 4 years ago

Hello,

  1. Yes, I used the same repo and pre-trained weight for evaluation.

  2. For original deeplab v2 evaluation, they use the original size of images and labels from the dataset. In our evaluation, we use the size of generated images (256x256) as input size to the deeplab model. The label maps are resized to 256x256 by nearest neighbor interpolation to match the size of generated images. This is a difference between our evaluation and original deeplab v2 and might be a reason why our evaluation score is slightly higher.

  3. Moreover, I think it might happen that the scores of generated images are slightly higher than the scores of the same model on real images. Because of the label noises in the evaluation set, the real images may not be strictly aligned with label maps. However, the generated images are directly generated from the label maps, so there's no such label noise issue. A higher mIoU and pixel accuracy score only mean that the images aligns better with the groundtruth segmentation map, but doesn't mean they are more realistic than the real images.

zkchen95 commented 4 years ago

Hi, @xh-liu I have a question about the quantitative evaluation. For Cityscapes dataset, I run evaluation scripts and the result is that mIoU=65.1, accuracy=93.9, FID=53.53. It has some difference with results shown in your paper, especially the accuracy is obviously higher than your 82.3. Do you have any setting about the segmentation scripts?

xh-liu commented 4 years ago

@ZzzackChen That's wired. I just tested the model again and it's still 82.3 pixel accuracy. I use the model and code from https://github.com/fyu/drn. The calculation of pixel accuracy is not provided in the code. How did you implement it?

zkchen95 commented 4 years ago

@xh-liu

# Mean pixel accuracy
    acc = np.diag(hist).sum() / (hist.sum() + 1e-12)

    # Per class accuracy
    cl_acc = np.diag(hist) / (hist.sum(1) + 1e-12)

    # Per class IoU
    iu = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist) + 1e-12)
justin-hpcnt commented 4 years ago

@xh-liu Thanks a lot! Now I can reproduce results :D

xh-liu commented 4 years ago

@ZzzackChen If you ignore 255 labels then the result will be 93 as you calculated. If you count 255 in the result will be 82.3. To keep consistent with the SPADE paper (https://arxiv.org/pdf/1903.07291.pdf) I chose the second calculation method for CityScapes dataset. For COCO-Stuff and ADE datasets, pixel accuracy calculation is included in the evaluation code, and I used the calculation method in the original code.

zkchen95 commented 4 years ago

@xh-liu Thank you ! Now I got it!

tlatlbtle commented 4 years ago

Hi, I found that in the original paper, FID for Cityscapes dataset is 71.8 instead of 53.53 as you report, how about this wired result?

xh-liu commented 4 years ago

@wjbKimberly The FID for Cityscapes is 54.3 reported in our paper. 71.8 is the FID score reported in the SPADE paper (https://arxiv.org/abs/1903.07291).

Ha0Tang commented 4 years ago

@justin-hpcnt Do you know how to train on 8 GPUs? Thanks a lot.

kravrolens commented 2 years ago

@ZzzackChen If you ignore 255 labels then the result will be 93 as you calculated. If you count 255 in the result will be 82.3. To keep consistent with the SPADE paper (https://arxiv.org/pdf/1903.07291.pdf) I chose the second calculation method for CityScapes dataset. For COCO-Stuff and ADE datasets, pixel accuracy calculation is included in the evaluation code, and I used the calculation method in the original code.

@xh-liu How to count 255 in the result when choosing the second calculation(DRN) method for CityScapes dataset? Thanks.