Hi,
Not an actual issue, just wanted to share that I implemented your technique for Vision Transformers.
https://github.com/jacobgil/vit-explain
This includes some tweaks to get this to work for images (dropping the lowest attentions, and fusing the attention heads with max instead of mean).
I also added an option for this to be class specific, by weighting the attention with the class gradient (and masking out negative gradients).
Hi, Not an actual issue, just wanted to share that I implemented your technique for Vision Transformers. https://github.com/jacobgil/vit-explain This includes some tweaks to get this to work for images (dropping the lowest attentions, and fusing the attention heads with max instead of mean). I also added an option for this to be class specific, by weighting the attention with the class gradient (and masking out negative gradients).