Open zrnupping opened 1 year ago
Hi! I think calculate the similarity of different attention maps to explain the influence of transformer depth is a good idea. Can you provide a clean code for calculating cosine similarity?
Hi! I think calculate the similarity of different attention maps to explain the influence of transformer depth is a good idea. Can you provide a clean code for calculating cosine similarity?