jiwoon-ahn / psa

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018
MIT License
380 stars 62 forks source link

CAM mIoU #13

Closed arnike closed 5 years ago

arnike commented 5 years ago

Hi Jiwoon,

thanks for sharing this nice work! I'm trying to generate the CAMs with a model I trained myself, but the mIoU I get is quite low, 42.28 (on PASCAL VOC2012 / train). I first ran

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network network.resnet38_cls --voc12_root ./data --weights weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.pth --wt_dec 5e-4

and then generated the CAMs with

python3 infer_cls.py --infer_list voc12/train.txt --voc12_root ./data --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred ./cams

Is there anything amiss? To get mIoU of 48% in the paper, should I use the dCRF? Here I don't.

More details, in case it helps:

Class # IoU Pr Re
background 10581 71.6 83.4 83.9
aeroplane 586 37.8 40.8 93.8
bicycle 485 43.6 50.3 86.1
bird 698 34.1 39.6 86.1
boat 460 28.1 35.1 79.0
bottle 651 27.0 31.9 81.9
bus 385 61.3 77.2 79.3
car 1079 39.9 50.1 81.1
cat 1000 43.8 73.3 57.9
chair 1063 34.0 44.1 68.3
cow 262 42.2 54.8 78.0
diningtable 520 39.2 54.6 65.4
dog 1176 41.9 66.4 63.4
horse 444 40.4 58.4 65.6
motorbike 481 51.9 62.8 83.0
person 3876 37.2 50.5 66.7
potted-plant 485 35.0 45.8 76.8
sheep 299 43.8 50.6 86.4
sofa 474 43.5 65.9 60.0
train 499 52.6 65.9 79.6
tv/monitor 548 38.9 45.3 87.4
ambiguous 330 0.0 0.0 0.0

mIou: 42.28 (background included)

Best, Nikita

jiwoon-ahn commented 5 years ago

Hi @arnike, You don't need to apply dCRF to get 48% mIoU. It seems you have tested it on the PASCAL VOC 2012 + SBD augmented dataset which is not the case of what's reported in the paper. Could you try again on the PASCAL VOC 2012 train set only? (The dataset should contain less than 1500 images.)

arnike commented 5 years ago

The numbers are indeed from VOC + SBD trainset. On VOC only (1464 images) I get mIoU: 38.44. Are the models also trained on VOC only in Table 1? I trained on VOC+SBD. I also wondered about the way BG is computed in infer_cls.py, which is different from Eq. 2. I then tried Eq. 2 instead, with alpha=16, but got similar results.

Some more details on training, if this can help. These are the last training steps:

Iter: 9550/ 9915 Loss:0.0229 imps:3.8 Fin:Fri Feb 15 05:23:17 2019 lr: 0.0051
Iter: 9600/ 9915 Loss:0.0261 imps:3.8 Fin:Fri Feb 15 05:23:05 2019 lr: 0.0045
Iter: 9650/ 9915 Loss:0.0271 imps:3.8 Fin:Fri Feb 15 05:22:52 2019 lr: 0.0038
Iter: 9700/ 9915 Loss:0.0262 imps:3.8 Fin:Fri Feb 15 05:22:40 2019 lr: 0.0032
Iter: 9750/ 9915 Loss:0.0285 imps:3.8 Fin:Fri Feb 15 05:22:28 2019 lr: 0.0025
Iter: 9800/ 9915 Loss:0.0289 imps:3.8 Fin:Fri Feb 15 05:22:16 2019 lr: 0.0018
Iter: 9850/ 9915 Loss:0.0220 imps:3.8 Fin:Fri Feb 15 05:22:04 2019 lr: 0.0011
Iter: 9900/ 9915 Loss:0.0270 imps:3.8 Fin:Fri Feb 15 05:21:52 2019 lr: 0.0003

validating ... loss: 0.04670578511431813

The classification mAP reached is around 93.0 %.

Thanks @jiwoon-ahn for any tips.

XZNWU commented 5 years ago

I don't know how to get mIou.Can you give me your mIou code if it be convenient to you.Thank you!!! @arnike

jiwoon-ahn commented 5 years ago

I'm not sure... The loss and the mAP you got seem fine to me.

arnike commented 5 years ago

Thanks @jiwoon-ahn for the tips! I just realised that I might not be computing IoUs correctly (thanks @xiangzhang06 for the hint :-). I computed IoU image-wise, but VOC's way is to accumulate the stats dataset-wise (i.e. count TPs, FPs, FNs in a confusion matrix and compute the Jaccard's Index from that). If I do this, I get mIoU 46.72 / 46.80 on train / val, which matched the score on val, but is still 1.3 off on train.

@jiwoon-ahn Do you think this difference is due to ending up in different local minima? I didn't experiment with it, but I presume there can be tangible fluctuations in IoU (± 1%) between different runs of SGD; after all, the net is not trained to do segmentation.

@xiangzhang06 Your question is off-topic, but do take a look at VOC devkit VOCcode/VOCevalseg.m.

XZNWU commented 5 years ago

Thanks @arnike

jiwoon-ahn commented 5 years ago

@arnike, Glad you solved the problem! I agree with you. As far as I know, all existing weakly supervised techniques using class attention share this issue as there does not exist no ground-truths to guide the network.

arnike commented 5 years ago

@jiwoon-ahn Thanks, I'll keep that in mind ;)