yangli18 / VLTVG

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
91 stars 8 forks source link

Question about the verification score #11

Open wildwolff opened 1 year ago

wildwolff commented 1 year ago

I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose,which is a little different with Eq.1. Could you further explain it?Thanks a lot!

text_embed = self.text_proj(text_info) img_embed = self.img_proj(img_feat) verify_score = (F.normalize(img_embed, p=2, dim=-1) F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True) verify_score = self.tf_scale torch.exp( - (1 - verify_score).pow(self.tf_pow) / (2 * self.tf_sigma**2))

yangli18 commented 1 year ago

@wildwolff Hi, it's just a matter of implementation. The inside part of Eq. 1 essentially computes the inner product of two feature vectors. Actually, you can use bmm after transposing the matrix/vector ( [Bx1xC] * [BxCx1] = [Bx1x1]), which is equivalent to the way I implemented it.