Closed HanielF closed 2 years ago
Hi,
The code works with similarities matrices which in our case have the shape 'queries x videos'. So, the code expects the following:
In our final experiments we used all the captions from the training set. In practice, for MSRVTT we observed that using 5000 queries we obtained good results, however better results are obtained by using the whole dataset (as observed in Fig 3a). We only used 5000 for some ablations, but we obtained the best results using all the queries.
Ioana
Thanks for you reply!
Another question is why the shape of the train_test
matrix of msvd is (97200, 670)
?
I count the number of queries in train set and the result is 48774.
Hi, Thanks for pointing that out! The CE code outputs a (max_query_per_videonr_videos)nr_videos matrix. In the case of MSVD the max_query_per_video=81. The additional entries are not valid numbers and they should not be used. Due to the fact that we forgot to take that into account when extracting the training_testing matrix for TT-CE+ the numbers now increased for QB-Norm. Thanks again for pointing that out! We updated the sims matrix (https://www.robots.ox.ac.uk/~vgg/research/teachtext/QB-Norm/msvd-sims.tar.gz) and also the results from the table.
Ioana
Thanks for your update! I will test it with the new similarity matrix.
I have some questions about the algorithm of qb-norm described as follows.
The process described in paper is