I wonder if you have conducted zero-shot experiments on MSRVTT or other downstream datasets. I get the following performance on standard text-to-video retrieval:
MR 68.5
R1 7.0
R10 23.4
R5 16.6
I am trying to make sure my pipeline is correct (with the UniVL model and my own trainer pipeline). Do you have zero-shot numbers on MSRVTT for comparison?
Hi! Thanks for the open-sourced code!
I wonder if you have conducted zero-shot experiments on MSRVTT or other downstream datasets. I get the following performance on standard text-to-video retrieval:
I am trying to make sure my pipeline is correct (with the UniVL model and my own trainer pipeline). Do you have zero-shot numbers on MSRVTT for comparison?