salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.73k stars 955 forks source link

BLIP Image Captioning GradCAM? #317

Open gwyong opened 1 year ago

gwyong commented 1 year ago

Hi, I used BlipForConditionalGeneration from transformers for image captioning. I want to visualize the reason of generated caption (word by word) like GradCAM.

I found a code from Albef (https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb), but it used an image-text matching model, not image captioning model.

Can you give me any hints or simple codes for this?

shams2023 commented 11 months ago

BlipForConditionalGeneration

你好,请问BLIP2可以批量化对图像生成字幕吗?