junsukchoe / ADL

Attention-based Dropout Layer for Weakly Supervised Object Localization (CVPR 2019 Oral)
MIT License
195 stars 36 forks source link

CAM in tensorflow implementation #9

Closed won-bae closed 5 years ago

won-bae commented 5 years ago

Hi,

I have a question about the CAM method in tensorflow implementation. Although you mentioned in the paper that you employed CAM [63], I don't see it is actually functional. The code only allows to use gradcam I guess. Am I missing something or is there any particular reason that it's not functional?

Thank you

junsukchoe commented 5 years ago

Hi won-bae,

I am sorry for your confusion. Actually CAM and Grad-CAM are almost same if we extract the heatmap from penultimate convolutional layer with GAP based architecture (See Grad-CAM paper). The only difference between them is that Grad-CAM applies ReLU on the heatmap. So, I implement Grad-CAM, and remove the ReLU layer and I think this is equivalent with CAM.

won-bae commented 5 years ago

I see. That makes sense. Thanks for your reply!

Btw, I've tried to reproduce the results you reported in the paper for VGG, Resnet on both CUB200 and Imagenet. Unfortunately, could not reproduce any of the results. Here are some results I got on CUB200 after 2~3 times of trial for each backbone along with the parameters:

VGG ADL CUB200 : top1 acc - 65.xx, top1 loc - 45.xx base-lr : 0.01, batch - 128, attdrop - 3 4 53, threshold - 0.8, keep_prob - 0.25

Resnet ADL CUB200 : top1 acc - 79.xx, top1 loc - 56.xx base-lr : 0.01, batch - 100, attdrop - 31 41 5, threshold - 0.9, keep_prob - 0.25

For Resnet, back size is 100 since 128 doesn't fit into GPU I have. It doesn't seem the discrepancy between the results reported in paper and the results I got is not due to randomness. Is there anything I should have taken into account other than the parameters I used as above? Any help would be appreciated. Thank you!

Btw, the only change I made was getting rid of if condition from https://github.com/junsukchoe/ADL/blob/ae0ba8c071a8723dc7042bd845536c447d29a3eb/Tensorflow/models_vgg.py#L56

junsukchoe commented 5 years ago

1) VGG

Unfortunately, I cannot access my lab computer which has exact codes and experimental settings about VGG experiments. Probably I can inspect more thoroughly this issue after November.

2) ResNet

I've just updated the model code according to the submission version. This is less effective for classification, but good for localization. Probably the paper score could be reproduced now. You can use this:

python CAM-resnet.py --gpu 0 --data /CUB200/ --cub --base-lr 0.1 --logdir ResNet50SE_CUB --load ResNet --stepscale 5.0 --batch 128 --depth 50 --mode se --attdrop 31 41 5 --keep_prob 0.25 --threshold 0.90

However, unfortunately, I have no resources to test this change now. After CVPR 2020 deadline, I will clean this released codes and upload pre-trained models. Sorry for the delay.

won-bae commented 5 years ago

Sorry to keep bothering you but it seems there is no args called 'preserve'. Can you explain what that is?

junsukchoe commented 5 years ago

It is about data pre-processing method. If the args.preserve == True, only the shortest edge of the image is resized to 256. But I do not used it for experiments for paper. I should remove it but I missed.. Sorry for your confusion.

junsukchoe commented 5 years ago

I've just cleaned-up the codes. It probably works well now.

won-bae commented 5 years ago

@junsukchoe Thank you so much for sharing the code. Unfortunately, I'm still not able to reproduce the result you got for resnet50 on CUB200. The highest top1 acc I got was 59.xx with threshold=0.2. In fact, for threshold=0.1, I was able to get 62.xx. Is the result you reported in paper based on threshold=0.1? If not, can you please confirm whether you can reproduce the result using the code you shared? I really appreciate for your help.

junsukchoe commented 5 years ago

No, I used 0.2 threshold. Could you share the train log with me?

won-bae commented 5 years ago

Sorry it took me sometime to rerun the code since I deleted a folder. The log doesn't show the final results so I summarized them below.

Results

CAM Threshold: 0.1 GT-known Loc: 0.759233690024163 Top-1 Loc: 0.6223679668622714 Top-1 Acc: 0.7961684501208146

CAM Threshold: 0.15000000000000002 GT-known Loc: 0.7447359337245426 Top-1 Loc: 0.6063168795305488 Top-1 Acc: 0.7961684501208146

CAM Threshold: 0.2 GT-known Loc: 0.7098722816706938 Top-1 Loc: 0.5726613738350017 Top-1 Acc: 0.7961684501208146

CAM Threshold: 0.25 GT-known Loc: 0.670003451846738 Top-1 Loc: 0.5409043838453572 Top-1 Acc: 0.7961684501208146

Log file

log.log

junsukchoe commented 5 years ago

I checked the log and found that it may not be the result by the latest codes. Please try again with the latest version.

ps. The latest version set the number of fully connected layer nodes for 1,000.

won-bae commented 5 years ago

Yeah you're right but since I was running it on CUB200, changed the final dimension to 'args.classnum'. Does it have to be 1000? If so, can you explain why?

junsukchoe commented 5 years ago

By mistake, I set the final dimension to 1000 for CUB experiments in submission version. It seems to increase the accuracy of WSOL. But I don't know the exact reason for now.

Note that I set the final dimension to 200 for other backbone networks.