showlab / all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
https://arxiv.org/abs/2203.07303
280 stars 17 forks source link

Video Retrieval MSRVTT train/test split. #5

Closed FuTSy13 closed 2 years ago

FuTSy13 commented 2 years ago

Hello!

Could you please tell me which train/test split you used when reporting results in the paper. I see hardcoded using of jsfusion split in AllInOne/datasets/msrvtt.py. So did you use only jsfusion test for both train-7k and train-9k?

Also note that when you report here you should use 'full' split

FingerRec commented 2 years ago

Hi.

Thanks for your interest in our work. We follow the train/test splits of TACO[1]. The performance pf this paper report in this headboard are Split 2( 7K/3K split). (Table 8).

But I find Clip2video use full split with your remind. Thanks for your kindly feedback and we have modified in this leadboard.

[1]. Yang et al. TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment](https://paperswithcode.com/paper/taco-token-aware-cascade-contrastive-learning). ICCV21