Closed maryamag85 closed 3 years ago
Hi @maryamag85, thanks for your interest in our work! Currently, there's no specific effort to implement our method for CLIP. However, it should be applicable by back-propagating LRP and gradients to the attention layers of both the image and the text, and then using our aggregation. We also have a new work coming out really soon, with a simplified and extended version of our method which doesn't use LRP so it's much easier to implement, and also works for non-self-attention such as encoder-decoder attention! Maybe that can help you with your CLIP visualization? Hope you stay tuned :)
Hi @maryamag85, great news- we have added CLIP to our new repo with a colab notebook to run examples! hope this helps :)
awesome news! thanks
Thanks for your great work. Is there any attempt on integrating CLIP into this model? I would like to see the attention map of the CLIP model (ViT base)