How to implement the visualizations like the Figure 5 in the supplementary of your paper?

muzairkhattak / ProText

[CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".

https://muzairkhattak.github.io/ProText/

MIT License

86 stars 4 forks source link

How to implement the visualizations like the Figure 5 in the supplementary of your paper? #1

Closed auniquesun closed 8 months ago

auniquesun commented 9 months ago

@muzairkhattak Thanks for sharing the paper and code. It is a great work!

I am very interested in the prompt learning for large foundation models and learning relative techniques. As the title shows, could you share your method of visualizing the prompt attention map of a visual image, such as the Figure 5 in the paper? Thanks.

muzairkhattak commented 9 months ago

Hi @auniquesun,

Thank you for showing interest in ProText!

We compute the attention maps using relevancy maps from attention layers of the CLIP model following this paper. You can refer to their repository or this online collab.

To produce attention maps using ProText prompts, we replace the MHSA blocks in the residual blocks of our ProText model here with a locally defined MHSA function (to be used from here) to use forward hooks in its attention layers.

I hope it is helpful!

kind regards, Muhammad Uzair

muzairkhattak commented 8 months ago

Hi @auniquesun,

I hope your issue is resolved now!

Feel free to reopen the issue if there are still any issues. Thank you.

auniquesun commented 8 months ago

Thanks for your reply. I will try it out. Thanks.