salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.86k stars 648 forks source link

visualize BLIP attention #80

Closed nikky4D closed 2 years ago

nikky4D commented 2 years ago

Is there a way to visualize what/where blip focuses in an image when given an input text? similar to grad cam for visualizing weights

LiJunnan1992 commented 2 years ago

Hi, you can use this code to visualize the gradcam on cross-attention maps for BLIP:

https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb

nikky4D commented 2 years ago

Thank you. I'll take a look.

linhlt-it-ee commented 1 year ago

Is there any document to explain how to convert attention map to image position?

BingliangLi commented 1 year ago

Hi, I just wonder did you manage to get the cross-attention maps for BLIP? If so, could you please share your code with us?

BoomShakaY commented 1 year ago

I have the same question about visualizations.