xuguohai / X-CLIP

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
https://arxiv.org/abs/2207.07285
MIT License
137 stars 15 forks source link

Reproduction on MSVD DataSet #9

Open bzy22 opened 2 weeks ago

bzy22 commented 2 weeks ago

Sorry to disturb you. When I reproduce the results on MSVD dataset, I get worse results than those in paper.My R@1 result on Video-to-text retrieval is 50.0, 10 points lower than 60.9 in the paper.The problem has confused me for a long time and I hope to get some advice for addressing this issue.

image