salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

Whole sentence visualization in Fig.4 and Fig10 #83

Closed haoshuai714 closed 2 years ago

haoshuai714 commented 2 years ago

Hello! I want to produce Grad-CAM visualizations for Whole sentence. I have change ---->Compute GradCAM as follow: with torch.no_grad():
mask = text_input.attention_mask.view(text_input.attention_mask.size(0),1,-1,1,1)

  grads = model.text_encoder.base_model.base_model.encoder.layer[block_num].crossattention.self.get_attn_gradients().detach()
  cams = model.text_encoder.base_model.base_model.encoder.layer[block_num].crossattention.self.get_attention_map().detach()

  cams = cams[:, :, :, 1:].reshape(image.size(0), 12, -1, 24, 24) * mask              
  grads = grads[:, :, :, 1:].clamp(min=0).reshape(image.size(0), 12, -1, 24, 24) * mask

  gradcam = cams * grads
  gradcam = gradcam.mean(1).mean(1)

How to get the Visualize GradCam for Whole sentence? Could you give me full code of Visualize GradCam for Whole sentence? (I do not similar to GradCam)

LiJunnan1992 commented 2 years ago

To obtain the gradcam for a sentence, we simply average the gradcams across all text tokens

haoshuai714 commented 2 years ago

Thanks!

BennoKrojer commented 2 years ago

Hi! Have you considered or empirically tested if the CLS token works as well for whole sentence visualization?

rhyhck commented 6 months ago

How to get the Visualize GradCam for Whole sentence?

Hi,Could you tell me how to get the Visualize GradCam for Whole sentence?