Closed EdenGabriel closed 8 months ago
Hello Eden
If you want to erase the use of dummy tokens, you would also have to deactivate losses for ms_align, distill, and orthogonal dummy.
Furthermore, you also have to comment out the corresponding parts in def loss_saliency (we have marked ' Saliency loss to t2v attn weights ' in the code).
Also, you should make sure that the number of dummy hyperparam is set to 0 or modify the crossattention.py. (since we use the number of dummy param to index the video features).
Ok, thanks for your patiently reply. I'll try it.
To bother you again, as per your suggestion I have removed the dummy and it works successfully. But I still have some doubts,①what is the meaning and function of "t2vattnvalues_neg" and "false_neg_mask"? ② when i train the only moment retrieval task (charades_sta or tacos), why the "lw_saliency" is not set to 0 while to 4? ③ what is the meaning of “token_typeembeddings”? I don't understand why you need to add these two embeddings to src*.
src_vid = src_vid + self.token_type_embeddings(torch.full_like(src_vid_mask.long(), 1)) # (bsz, L_vid, d)
src_txt = src_txt + self.token_type_embeddings(torch.zeros_like(src_txt_mask.long()))
Thanks.
Thank you for your patient reply. I see.
Hi, Thank you for your interesting work and code implementation. I want to explore the results without dummy. But, when I comment out the dummy-related parts of the code, I always get an error
assert (spans1[:, 1] >= spans1[:, 0]).all()
. Can you help me? The commented code is as follows. Thanks.model.py as follows: