salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.3k stars 921 forks source link

Blip2 Attention Map Visualization #646

Open SeeonOwO opened 5 months ago

SeeonOwO commented 5 months ago

Hi guys, I am trying to visualize the attention map of the pre-trained model Blip2-opt-6.7b.

I set the flags related to attention output to True and successfully got cross_attentions from the output object BaseModelOutputWithPoolingAndCrossAttentions. The length of cross attentions is 6 and each cross attention shape is batch size (1) x 12 x 257 x 32. All of them were consistent with config files (number of heads, patch size x patch size + 1, number of query tokens). So I switched the last two dimensions, averaged over query tokens and heads, then reshaped it to 16 x 16, and visualized it over the original image. However, the visualizations looked non-sense:

ea88baa0a0e5dafe62ab960c3aeaa81

Could anyone give me some advice?

GasolSun36 commented 3 months ago

same question, any solutions?

changbaozhou commented 1 month ago

same question in the qformer of instructblip, any solutions?