Could you give me some guidance on how I can adapt this to vllm like llava?

jacobgil / pytorch-grad-cam

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

https://jacobgil.github.io/pytorch-gradcam-book

MIT License

10.64k stars 1.56k forks source link

Could you give me some guidance on how I can adapt this to vllm like llava? #532

Open Yingshu-Li opened 1 month ago

Yingshu-Li commented 1 month ago

Hello,

I’m currently exploring how to visualize the heatmap on LLAVA or other kinds of multimodal large language model to understand the model’s focus during text generation. I am familiar with using Grad-CAM for single-target classification tasks. However, with LLAVA generating complete sentences, I’m unsure how to obtain heatmaps for individual words. Could you provide any guidance or advice on how to approach this?

fuhaha824 commented 1 week ago

I have the same problem