Question about the verification score

yangli18 / VLTVG

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

91 stars 8 forks source link

I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose，which is a little different with Eq.1. Could you further explain it？Thanks a lot！

text_embed = self.text_proj(text_info) img_embed = self.img_proj(img_feat) verify_score = (F.normalize(img_embed, p=2, dim=-1) F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True) verify_score = self.tf_scale torch.exp( - (1 - verify_score).pow(self.tf_pow) / (2 * self.tf_sigma**2))

yangli18 / VLTVG

Question about the verification score #11