Open wildwolff opened 1 year ago
@wildwolff Hi, it's just a matter of implementation. The inside part of Eq. 1 essentially computes the inner product of two feature vectors. Actually, you can use bmm after transposing the matrix/vector ( [Bx1xC] * [BxCx1] = [Bx1x1]), which is equivalent to the way I implemented it.
I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose,which is a little different with Eq.1. Could you further explain it?Thanks a lot!
text_embed = self.text_proj(text_info) img_embed = self.img_proj(img_feat) verify_score = (F.normalize(img_embed, p=2, dim=-1) F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True) verify_score = self.tf_scale torch.exp( - (1 - verify_score).pow(self.tf_pow) / (2 * self.tf_sigma**2))