zhoudaquan / dvit_repo

MIT License
135 stars 23 forks source link

cosine similarity of different attention maps #12

Open zrnupping opened 1 year ago

zrnupping commented 1 year ago

Hi! I think calculate the similarity of different attention maps to explain the influence of transformer depth is a good idea. Can you provide a clean code for calculating cosine similarity?