Closed SketchX-QZY closed 2 years ago
Hi @SketchX-QZY, thanks for your interest in our work!
I can't really see the attention implementation itself here, but you need to register the hook on the attention heads after the softmax operation. See this code for example. If indeed self_attention_outputs
contains the attention heads after softmax it should be fine.
Best, Hila.
@SketchX-QZY closing due to inactivity, please reopen if necessary.
Hi, thank you for this great work!
I have trained a Transformer model with ViT - HuggingFace. When I tried to visualise the attention maps I found your work. I am quite interesting but I find your code and HuggingFace's are different. I tried to modify the source code like this.
I am new to Transformer. I am not sure whether I register the hook in the right tensor. Can you help me check it?
Thank you very much!