BLIP Image Captioning GradCAM?

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD 3-Clause "New" or "Revised" License

9.98k stars 974 forks source link

BLIP Image Captioning GradCAM? #317

Open gwyong opened 1 year ago

gwyong commented 1 year ago

Hi, I used BlipForConditionalGeneration from transformers for image captioning. I want to visualize the reason of generated caption (word by word) like GradCAM.

I found a code from Albef (https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb), but it used an image-text matching model, not image captioning model.

Can you give me any hints or simple codes for this?

shams2023 commented 1 year ago

BlipForConditionalGeneration

你好，请问BLIP2可以批量化对图像生成字幕吗？