jayleicn / singularity

[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
https://arxiv.org/abs/2206.03428
MIT License
130 stars 14 forks source link

How many files were available during pre-training on WebVid+CC3M? #19

Closed prote376 closed 2 years ago

prote376 commented 2 years ago

Thank you for sharing this code.

I have reproduced pre-training by following GitHub script.

I got below results which are much lower than those of the paper. (zero-shot results on MSRVTT with WebVid+CC3M in the paper: 28.4, 50.2, 59.5)

txt_r1 txt_r5 txt_r10 txt_r_mean img_r1 img_r5 img_r10 img_r_mean r_mean msrvtt_1k_test/ 22.3 40.3 49.3 37.3 25.6 46.0 55.8 42.47 39.88 msrvtt_1k_test_emb/ 17.8 34.6 43.9 32.1 20.3 40.8 51.5 37.53 34.82

I thought it was caused by lack of data which was not available to download.

How many files were available during pre-training on WebVid+CC3M?

jayleicn commented 2 years ago

We have data statistics in the appendix that matches exactly the number of files used in the paper. If you have similar #files, You can probably check #frames used etc.

prote376 commented 2 years ago

I checked that my CC3M dataset has only 2.18M compared to 2.95M of yours. I should check the download process. Thank you!