Hi! Thank you for your great work!
I'm a bit curious here about how you calculated the cosine similarity.
The code just put the similarity calculation with similarity_matrix = torch.matmul(features, features.T).
I understand why you did it, thanks. The l2-normalization of hidden states is equivalent to the use of cosine similarity instead of simply dot product.
Hi! Thank you for your great work! I'm a bit curious here about how you calculated the cosine similarity. The code just put the similarity calculation with
similarity_matrix = torch.matmul(features, features.T)
.