Closed won-bae closed 5 years ago
Hi won-bae,
I am sorry for your confusion. Actually CAM and Grad-CAM are almost same if we extract the heatmap from penultimate convolutional layer with GAP based architecture (See Grad-CAM paper). The only difference between them is that Grad-CAM applies ReLU on the heatmap. So, I implement Grad-CAM, and remove the ReLU layer and I think this is equivalent with CAM.
I see. That makes sense. Thanks for your reply!
Btw, I've tried to reproduce the results you reported in the paper for VGG, Resnet on both CUB200 and Imagenet. Unfortunately, could not reproduce any of the results. Here are some results I got on CUB200 after 2~3 times of trial for each backbone along with the parameters:
VGG ADL CUB200 : top1 acc - 65.xx, top1 loc - 45.xx base-lr : 0.01, batch - 128, attdrop - 3 4 53, threshold - 0.8, keep_prob - 0.25
Resnet ADL CUB200 : top1 acc - 79.xx, top1 loc - 56.xx base-lr : 0.01, batch - 100, attdrop - 31 41 5, threshold - 0.9, keep_prob - 0.25
For Resnet, back size is 100 since 128 doesn't fit into GPU I have. It doesn't seem the discrepancy between the results reported in paper and the results I got is not due to randomness. Is there anything I should have taken into account other than the parameters I used as above? Any help would be appreciated. Thank you!
Btw, the only change I made was getting rid of if condition from https://github.com/junsukchoe/ADL/blob/ae0ba8c071a8723dc7042bd845536c447d29a3eb/Tensorflow/models_vgg.py#L56
1) VGG
Unfortunately, I cannot access my lab computer which has exact codes and experimental settings about VGG experiments. Probably I can inspect more thoroughly this issue after November.
2) ResNet
I've just updated the model code according to the submission version. This is less effective for classification, but good for localization. Probably the paper score could be reproduced now. You can use this:
python CAM-resnet.py --gpu 0 --data /CUB200/ --cub --base-lr 0.1 --logdir ResNet50SE_CUB --load ResNet --stepscale 5.0 --batch 128 --depth 50 --mode se --attdrop 31 41 5 --keep_prob 0.25 --threshold 0.90
However, unfortunately, I have no resources to test this change now. After CVPR 2020 deadline, I will clean this released codes and upload pre-trained models. Sorry for the delay.
Sorry to keep bothering you but it seems there is no args called 'preserve'. Can you explain what that is?
It is about data pre-processing method. If the args.preserve == True, only the shortest edge of the image is resized to 256. But I do not used it for experiments for paper. I should remove it but I missed.. Sorry for your confusion.
I've just cleaned-up the codes. It probably works well now.
@junsukchoe Thank you so much for sharing the code. Unfortunately, I'm still not able to reproduce the result you got for resnet50 on CUB200. The highest top1 acc I got was 59.xx with threshold=0.2. In fact, for threshold=0.1, I was able to get 62.xx. Is the result you reported in paper based on threshold=0.1? If not, can you please confirm whether you can reproduce the result using the code you shared? I really appreciate for your help.
No, I used 0.2 threshold. Could you share the train log with me?
Sorry it took me sometime to rerun the code since I deleted a folder. The log doesn't show the final results so I summarized them below.
CAM Threshold: 0.1 GT-known Loc: 0.759233690024163 Top-1 Loc: 0.6223679668622714 Top-1 Acc: 0.7961684501208146
CAM Threshold: 0.15000000000000002 GT-known Loc: 0.7447359337245426 Top-1 Loc: 0.6063168795305488 Top-1 Acc: 0.7961684501208146
CAM Threshold: 0.2 GT-known Loc: 0.7098722816706938 Top-1 Loc: 0.5726613738350017 Top-1 Acc: 0.7961684501208146
CAM Threshold: 0.25 GT-known Loc: 0.670003451846738 Top-1 Loc: 0.5409043838453572 Top-1 Acc: 0.7961684501208146
I checked the log and found that it may not be the result by the latest codes. Please try again with the latest version.
ps. The latest version set the number of fully connected layer nodes for 1,000.
Yeah you're right but since I was running it on CUB200, changed the final dimension to 'args.classnum'. Does it have to be 1000? If so, can you explain why?
By mistake, I set the final dimension to 1000 for CUB experiments in submission version. It seems to increase the accuracy of WSOL. But I don't know the exact reason for now.
Note that I set the final dimension to 200 for other backbone networks.
Hi,
I have a question about the CAM method in tensorflow implementation. Although you mentioned in the paper that you employed CAM [63], I don't see it is actually functional. The code only allows to use gradcam I guess. Am I missing something or is there any particular reason that it's not functional?
Thank you