how to integrate CLIP model and find CLIP attention map?

maryamag85 commented 3 years ago

Thanks for your great work. Is there any attempt on integrating CLIP into this model? I would like to see the attention map of the CLIP model (ViT base)

hila-chefer commented 3 years ago

Hi @maryamag85, thanks for your interest in our work! Currently, there's no specific effort to implement our method for CLIP. However, it should be applicable by back-propagating LRP and gradients to the attention layers of both the image and the text, and then using our aggregation. We also have a new work coming out really soon, with a simplified and extended version of our method which doesn't use LRP so it's much easier to implement, and also works for non-self-attention such as encoder-decoder attention! Maybe that can help you with your CLIP visualization? Hope you stay tuned :)

hila-chefer commented 3 years ago

Hi @maryamag85, great news- we have added CLIP to our new repo with a colab notebook to run examples! hope this helps :)

maryamag85 commented 3 years ago

awesome news! thanks

hila-chefer / Transformer-Explainability

how to integrate CLIP model and find CLIP attention map? #14