Closed May2333 closed 1 year ago
Thank you for providing this issue! You are right, so there was a bug in the code we provided, which we have now fixed:
s_t = (torch.rand(text_mask.size()) > 0.5).long().to(retrieve_logits.device)
s_j = (torch.rand(video_mask.size()) > 0.5).long().to(retrieve_logits.device)
s_t[:, i], s_j[:, j] = 0, 0
_text_mask, _video_mask = text_mask.clone(), video_mask.clone()
_text_mask.masked_fill_((1 - s_t).to(torch.bool), 0)
_video_mask.masked_fill_((1 - s_j).to(torch.bool), 0)
We randomly sampled a subset of players by s_t and s_j. Your contribution has been incredibly helpful and I appreciate your generosity in providing this issue. The current code has been fixed.
I appreciate your quick reply. But may I ask that could this bug influence the result of the paper?
This code was rewritten by me, and some bugs occurred in the process of rewriting, so it will not affect the results in the paper. The original code was on the company computer, but I didn't have a copy when I left.
Thank you for your great work! But I'm still wondering how the Banzhaf Interaction works in the following codes:
Does the _text_mask[:, s_t] = 0 mean masking the first word token and second word token because values in s_t and s_j are only 1 and 0? Or I just have the wrong understanding about it. any reply will be helpful!