Closed cdqncn closed 2 years ago
Hi, The reported result in the paper is for text-to-video retrieval, whereas my evaluation code will output results for both video-to-text and text-to-video. Is it possible that your R1 is for video-to-text?
Thanks for your reply!
My R1 is for video-to-text, you are right! I learn a lot from your works: ALBEF and BLIP, and I am looking forward to your code on videoQA. Thanks a lot again!
Yours, Sincerely
Dear author,
I tried to use the released checkpoints (model_base_retrieval_coco.pth and model_large_retrieval_coco.pth) to test on the msrvtt retrieval dataset, but I got R1 35.8 and 39.74, respectively. How could I get the 43.3 presented in your arXiv paper? Thanks!
Yours, Sincerely