Closed prote376 closed 2 years ago
We have data statistics in the appendix that matches exactly the number of files used in the paper. If you have similar #files, You can probably check #frames used etc.
I checked that my CC3M dataset has only 2.18M compared to 2.95M of yours. I should check the download process. Thank you!
Thank you for sharing this code.
I have reproduced pre-training by following GitHub script.
I got below results which are much lower than those of the paper. (zero-shot results on MSRVTT with WebVid+CC3M in the paper: 28.4, 50.2, 59.5)
txt_r1 txt_r5 txt_r10 txt_r_mean img_r1 img_r5 img_r10 img_r_mean r_mean msrvtt_1k_test/ 22.3 40.3 49.3 37.3 25.6 46.0 55.8 42.47 39.88 msrvtt_1k_test_emb/ 17.8 34.6 43.9 32.1 20.3 40.8 51.5 37.53 34.82
I thought it was caused by lack of data which was not available to download.
How many files were available during pre-training on WebVid+CC3M?