CAM mIoU - Githubissues

arnike commented 5 years ago

Hi Jiwoon,

thanks for sharing this nice work! I'm trying to generate the CAMs with a model I trained myself, but the mIoU I get is quite low, 42.28 (on PASCAL VOC2012 / train). I first ran

python3 train_cls.py --lr 0.1 --batch_size 16 --max_epoches 15 --crop_size 448 --network network.resnet38_cls --voc12_root ./data --weights weights/ilsvrc-cls_rna-a1_cls1000_ep-0001.pth --wt_dec 5e-4

and then generated the CAMs with

python3 infer_cls.py --infer_list voc12/train.txt --voc12_root ./data --network network.resnet38_cls --weights res38_cls.pth --out_cam_pred ./cams

Is there anything amiss? To get mIoU of 48% in the paper, should I use the dCRF? Here I don't.

More details, in case it helps:

Class	#	IoU	Pr	Re
background	10581	71.6	83.4	83.9
aeroplane	586	37.8	40.8	93.8
bicycle	485	43.6	50.3	86.1
bird	698	34.1	39.6	86.1
boat	460	28.1	35.1	79.0
bottle	651	27.0	31.9	81.9
bus	385	61.3	77.2	79.3
car	1079	39.9	50.1	81.1
cat	1000	43.8	73.3	57.9
chair	1063	34.0	44.1	68.3
cow	262	42.2	54.8	78.0
diningtable	520	39.2	54.6	65.4
dog	1176	41.9	66.4	63.4
horse	444	40.4	58.4	65.6
motorbike	481	51.9	62.8	83.0
person	3876	37.2	50.5	66.7
potted-plant	485	35.0	45.8	76.8
sheep	299	43.8	50.6	86.4
sofa	474	43.5	65.9	60.0
train	499	52.6	65.9	79.6
tv/monitor	548	38.9	45.3	87.4
ambiguous	330	0.0	0.0	0.0

mIou: 42.28 (background included)

Best, Nikita

jiwoon-ahn commented 5 years ago

Hi @arnike, You don't need to apply dCRF to get 48% mIoU. It seems you have tested it on the PASCAL VOC 2012 + SBD augmented dataset which is not the case of what's reported in the paper. Could you try again on the PASCAL VOC 2012 train set only? (The dataset should contain less than 1500 images.)

arnike commented 5 years ago

The numbers are indeed from VOC + SBD trainset. On VOC only (1464 images) I get mIoU: 38.44. Are the models also trained on VOC only in Table 1? I trained on VOC+SBD. I also wondered about the way BG is computed in infer_cls.py, which is different from Eq. 2. I then tried Eq. 2 instead, with alpha=16, but got similar results.

Some more details on training, if this can help. These are the last training steps:

Iter: 9550/ 9915 Loss:0.0229 imps:3.8 Fin:Fri Feb 15 05:23:17 2019 lr: 0.0051
Iter: 9600/ 9915 Loss:0.0261 imps:3.8 Fin:Fri Feb 15 05:23:05 2019 lr: 0.0045
Iter: 9650/ 9915 Loss:0.0271 imps:3.8 Fin:Fri Feb 15 05:22:52 2019 lr: 0.0038
Iter: 9700/ 9915 Loss:0.0262 imps:3.8 Fin:Fri Feb 15 05:22:40 2019 lr: 0.0032
Iter: 9750/ 9915 Loss:0.0285 imps:3.8 Fin:Fri Feb 15 05:22:28 2019 lr: 0.0025
Iter: 9800/ 9915 Loss:0.0289 imps:3.8 Fin:Fri Feb 15 05:22:16 2019 lr: 0.0018
Iter: 9850/ 9915 Loss:0.0220 imps:3.8 Fin:Fri Feb 15 05:22:04 2019 lr: 0.0011
Iter: 9900/ 9915 Loss:0.0270 imps:3.8 Fin:Fri Feb 15 05:21:52 2019 lr: 0.0003

validating ... loss: 0.04670578511431813

The classification mAP reached is around 93.0 %.

Thanks @jiwoon-ahn for any tips.

XZNWU commented 5 years ago

I don't know how to get mIou.Can you give me your mIou code if it be convenient to you.Thank you!!! @arnike

jiwoon-ahn commented 5 years ago

I'm not sure... The loss and the mAP you got seem fine to me.

The network is trained with PASCAL VOC 2012 train + SBD dataset, but it is evaluated on PASCAL VOC 2012 train set.
We didn't explain how to measure the performance of CAMs as it is trivial, and thresholding the bottom 20% is a common practice. But as you mentioned, using alpha=16 can give similar results.
The accuracy of CAMs can vary depending upon the threshold. I think it is worth trying to adjust it.

arnike commented 5 years ago

Thanks @jiwoon-ahn for the tips! I just realised that I might not be computing IoUs correctly (thanks @xiangzhang06 for the hint :-). I computed IoU image-wise, but VOC's way is to accumulate the stats dataset-wise (i.e. count TPs, FPs, FNs in a confusion matrix and compute the Jaccard's Index from that). If I do this, I get mIoU 46.72 / 46.80 on train / val, which matched the score on val, but is still 1.3 off on train.

@jiwoon-ahn Do you think this difference is due to ending up in different local minima? I didn't experiment with it, but I presume there can be tangible fluctuations in IoU (± 1%) between different runs of SGD; after all, the net is not trained to do segmentation.

@xiangzhang06 Your question is off-topic, but do take a look at VOC devkit VOCcode/VOCevalseg.m.

XZNWU commented 5 years ago

Thanks @arnike

jiwoon-ahn commented 5 years ago

@arnike, Glad you solved the problem! I agree with you. As far as I know, all existing weakly supervised techniques using class attention share this issue as there does not exist no ground-truths to guide the network.

arnike commented 5 years ago

@jiwoon-ahn Thanks, I'll keep that in mind ;)

jiwoon-ahn / psa

CAM mIoU #13