Open ychengrong opened 2 years ago
https://drive.google.com/drive/folders/1_lxspG_nzPstxDWhKQqPWhYZlB6zPMGs?usp=sharing Hi, Daquan! I tried the code and .pth.tar file you provided above. However, I got the output visualization for layer 1 like this. The key to the model I used was "blocks.{layer_index}.attn.qkv.weight". Can you give me some advice about this? Appreciate that!
Hi,
The visualizations in the paper are based on the attention maps. When plotting the attentions maps, we take an average along the head dimension to get a NxN attention map for 2D plots. We do not plot the weights before but I was thinking weights are following Gaussian distribution generally.
I hope this can help you a little bit. Do drop me a message if any other parts are not clear.
Much thanks for the quick reply! You've made great work and I want to achieve a similar output as shown in the paper. I noticed the attention maps in shallow blocks are displayed like white lines along the diagonal. Could you please share the attention data you used about attention maps? Or, Could you share the pickle data for this attention map if it's convenient (the link you provided before seems to be unavailable).
https://drive.google.com/drive/folders/1_lxspG_nzPstxDWhKQqPWhYZlB6zPMGs?usp=sharing Hi, Daquan! I tried the code and .pth.tar file you provided above. However, I got the output visualization for layer 1 like this. The key to the model I used was "blocks.{layer_index}.attn.qkv.weight". Can you give me some advice about this? Appreciate that!