linyq2117 / CLIP-ES

MIT License
175 stars 9 forks source link

Cam performance #15

Closed muchengxue0911 closed 8 months ago

muchengxue0911 commented 8 months ago

Thank you for your solid work. I'm reproducing your results, however, I can only get a result of 65.3 in CAM. I use an env of pytorch 1.9.0 and cuda 11.1 on a 3090. I change the file path and use the following command.

1 CUDA_VISIBLE_DEVICES=0 python generate_cams_voc12.py --split_file ./voc12/train_aug.txt --num_workers 1 --cam_out_dir ./output/voc12/cams 2 python eval_cam.py --cam_out_dir ./output/voc12/cams --cam_type attn_highres --split_file ./voc12/train.txt

and the result is : 1464 images to eval 1 0.6530047267030804 2 0.6320998620387147 {'Pixel Accuracy': 0.8778208683287688, 'Mean Accuracy': 0.8145488664308644, 'Frequency Weighted IoU': 0.7928293309400717, 'Mean IoU': 0.6530047267030804, 'Class IoU': {0: 0.8535294292729055, 1: 0.6130334632912042, 2: 0.6508222055180352, 3: 0.7070871766635738, 4: 0.5137165538355816, 5: 0.5389611068698299, 6: 0.7805507459439889, 7: 0.6934887957581246, 8: 0.7957022407450646, 9: 0.3969126198448379, 10: 0.763830780381508, 11: 0.4783550023269957, 12: 0.8016436417834419, 13: 0.7574893461824923, 14: 0.7504945026731186, 15: 0.5885225316760763, 16: 0.520919667740022, 17: 0.7986631191640343, 18: 0.5163718776232233, 19: 0.6667840730078058, 20: 0.526220380462825}}

Are there any advice to reproduce your result?

linyq2117 commented 8 months ago

Thanks for your interest!

The result is strange and it shouldn't be like this given your command. Could you provide more information, like, which CLIP model did you use, or did you perform some modifications on the code?

muchengxue0911 commented 8 months ago

I downloaded the CLIP model from your link, which is: https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt , and I only changed the VOC path and didn't modify the code.

linyq2117 commented 8 months ago

I use the same env with you and re-run the code. The result is normal for me.

1464 images to eval 1 0.7084182165521185 2 0.6600291742360047 {'Pixel Accuracy': 0.9101851882152913, 'Mean Accuracy': 0.8892877034348168, 'Frequency Weighted IoU': 0.8450196609472428, 'Mean IoU': 0.7084182165521185, 'Class IoU': {0: 0.8902073794443365, 1: 0.65836592550838, 2: 0.486400383690162, 3: 0.7686163773404336, 4: 0.5633237505301961, 5: 0.5757771927686446, 6: 0.8440604824288674, 7: 0.75357773540567, 8: 0.8705375719476196, 9: 0.4874239107791828, 10: 0.8117929379499486, 11: 0.6299910314353231, 12: 0.868078446176208, 13: 0.8069361322790224, 14: 0.7739665005600384, 15: 0.633124464824793, 16: 0.5484016946872441, 17: 0.8589386277035465, 18: 0.7053738870822491, 19: 0.7689125257829013, 20: 0.5729755892697185}}

I provide the generated cams in Google Drive. You can eval_cam based on it to check whether this problem is caused by cam generation or evaluation.

muchengxue0911 commented 8 months ago

Thanks for your help. I have solved the problem.