jpthu17 / HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Apache License 2.0
109 stars 5 forks source link

What's the function of the following code in BanzhafInteraction Class #4

Closed May2333 closed 1 year ago

May2333 commented 1 year ago

Thank you for your great work! But I'm still wondering how the Banzhaf Interaction works in the following codes:

        s_t = (torch.rand((self.t_len)) > 0.5).long().to(retrieve_logits.device) 
        s_j = (torch.rand((self.v_len)) > 0.5).long().to(retrieve_logits.device) 
        s_t[i], s_j[j] = 0, 0

        _text_mask, _video_mask = text_mask.clone(), video_mask.clone()
        _text_mask[:, s_t] = 0
        _video_mask[:, s_j] = 0

Does the _text_mask[:, s_t] = 0 mean masking the first word token and second word token because values in s_t and s_j are only 1 and 0? Or I just have the wrong understanding about it. any reply will be helpful!

jpthu17 commented 1 year ago

Thank you for providing this issue! You are right, so there was a bug in the code we provided, which we have now fixed:

        s_t = (torch.rand(text_mask.size()) > 0.5).long().to(retrieve_logits.device)
        s_j = (torch.rand(video_mask.size()) > 0.5).long().to(retrieve_logits.device)
        s_t[:, i], s_j[:, j] = 0, 0

        _text_mask, _video_mask = text_mask.clone(), video_mask.clone()
        _text_mask.masked_fill_((1 - s_t).to(torch.bool), 0)
        _video_mask.masked_fill_((1 - s_j).to(torch.bool), 0)

We randomly sampled a subset of players by s_t and s_j. Your contribution has been incredibly helpful and I appreciate your generosity in providing this issue. The current code has been fixed.

May2333 commented 1 year ago

I appreciate your quick reply. But may I ask that could this bug influence the result of the paper?

jpthu17 commented 1 year ago

This code was rewritten by me, and some bugs occurred in the process of rewriting, so it will not affect the results in the paper. The original code was on the company computer, but I didn't have a copy when I left.