ioanacroi / qb-norm

Cross Modal Retrieval with Querybank Normalisation
https://vladbogo.github.io/QB-Norm/
MIT License
53 stars 3 forks source link

How to construct querybank probe matrix? #3

Closed HanielF closed 2 years ago

HanielF commented 2 years ago

I have some questions about the algorithm of qb-norm described as follows.

The process described in paper is 2022-05-06-15-40-06-JInsdc

  1. Considering that in MSRVTT-1ka, |G| = 1000, N=5000, given a query, then the process will be (1000 x 5000) x (5000 x 1) = (1000 x 1). Does the |G| is the number of videos in testing set and N is the number of text queries sampled from training set? If so, why the matrix with the shape of (5000 x 1) is obtained instead of (1000 x 1) when a query is given. Does it means the query video is used to calculated similarity with all querybank text?
  2. How to construct the querybank? I follow the paper to random 5000 samples from training set, but the result is really bad and is unstable when the querybank is changed. Could you provide the script of constructing query bank and computing querybank probe matrix?
ioanacroi commented 2 years ago

Hi,

  1. The code works with similarities matrices which in our case have the shape 'queries x videos'. So, the code expects the following:

    • sims_train_test_path : a path to a similarity matrix containing nr_queries_from_training x nr_videos_testing similarities (this is the transpose of P probe matrix described in the paper)
    • sims_test_path: a path to a similarity matrix containing nr_queries_from_testing x nr_videos_testing similarities If you don't obtain the right shape of the matrix, please pay attention to the input matrix (it should be 'queries x videos', so maybe you should transpose it).
  2. In our final experiments we used all the captions from the training set. In practice, for MSRVTT we observed that using 5000 queries we obtained good results, however better results are obtained by using the whole dataset (as observed in Fig 3a). We only used 5000 for some ablations, but we obtained the best results using all the queries.

Ioana

HanielF commented 2 years ago

Thanks for you reply! Another question is why the shape of the train_test matrix of msvd is (97200, 670)? I count the number of queries in train set and the result is 48774.

ioanacroi commented 2 years ago

Hi, Thanks for pointing that out! The CE code outputs a (max_query_per_videonr_videos)nr_videos matrix. In the case of MSVD the max_query_per_video=81. The additional entries are not valid numbers and they should not be used. Due to the fact that we forgot to take that into account when extracting the training_testing matrix for TT-CE+ the numbers now increased for QB-Norm. Thanks again for pointing that out! We updated the sims matrix (https://www.robots.ox.ac.uk/~vgg/research/teachtext/QB-Norm/msvd-sims.tar.gz) and also the results from the table.

Ioana

HanielF commented 2 years ago

Thanks for your update! I will test it with the new similarity matrix.