I found Figure 1 to be very insightful. However, I’m having some difficulty reproducing it. I’ve managed to extract the attention weights from the last layer of CLIP (in the shape num_heads * seq_len * seq_len).
Could you please share the code used to plot this figure (perhaps using CLIP's standard attention weights as an example)?
Thank you for the excellent work!
I found Figure 1 to be very insightful. However, I’m having some difficulty reproducing it. I’ve managed to extract the attention weights from the last layer of CLIP (in the shape
num_heads * seq_len * seq_len
).Could you please share the code used to plot this figure (perhaps using CLIP's standard attention weights as an example)?
Thank you in advance!