microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
https://arxiv.org/abs/2002.06353
MIT License
339 stars 54 forks source link

Estimate of zero-shot performance #45

Open bpiyush opened 2 years ago

bpiyush commented 2 years ago

Hi! Thanks for the open-sourced code!

I wonder if you have conducted zero-shot experiments on MSRVTT or other downstream datasets. I get the following performance on standard text-to-video retrieval:

MR             68.5
R1              7.0
R10             23.4
R5             16.6

I am trying to make sure my pipeline is correct (with the UniVL model and my own trainer pipeline). Do you have zero-shot numbers on MSRVTT for comparison?

ArrowLuo commented 2 years ago

Hi @bpiyush, sorry for my delayed reply. I am also sorry that we have no results on the zero-shot performance.