starmemda / CAMoE

94 stars 9 forks source link

Does DSL only work for caption and video pairing diagonally ? #6

Closed fly-dragon211 closed 2 years ago

fly-dragon211 commented 2 years ago

image

It seems that caption and video must be one by one pairing diagonally .

I am trying to evaluate the DSL on MSRVTT full split (2990 videos and 2990*20 captions), but the DSL didn't work. Howerver, on MSRVTT 1k split (1000 videos and 1000 captions), it works well (49.0% V2T-R@1 and 47.8% T2V-R@1). My model is CLIP4CLIP.

Therefore, video and text matching information needs to be known in advance. Could you report the random shuffle comparative experiments on evaluation? If the random shuffle invalidate DSL, I am suspicious of data leak.

starmemda commented 2 years ago

Maybe you can re-check your code for multi-sentence test? In our test of MSR-VTT full, the improvement seems to be more obvious. As for the casual one we test, it exhibits no difference. Maybe you can regard it as a re-rank method firstly? Though regarding it as the loss may further improve the performance.

celestialxevermore commented 1 year ago

@fly-dragon211 Hi, I'm toddler on this task, TVR. May I beg your help for how to reproduce DSL on CLIP4Clip? I thank you in advance.

ZijianCHEN-infodeliver commented 3 months ago

嗨,我是这项任务的蹒跚学步者,TVR。我可以恳求您帮助如何在 CLIP4Clip 上重现 DSL 吗?我先谢谢你。

Hello, I am also trying to use DSL in CLIP4clip, but the effect is not satisfactory. Have you successfully reproduced it?