Closed pzSuen closed 2 years ago
Hi. For the visualization part of Transformer, we visualized the attention values of class tokens in the self-attention matrix to the rest of the feature tokens. For the specific visualization code, we refer to the CLAM processing scheme. Hope it helps you.
Hello,I have a question.In your paper, self attention is actually linear attention. Here, we show the product of attention score and value. How do you calculate the attention score? Can you answer my questions? Looking forward to your replay.
Hello,
Does it seem that you haven't mentioned how to compute the attention of tokens in the paper? And I also can't find the code to compute the heatmap and visualization.
Can you answer my questions? Looking forward to your replay.