vandal-vpr / vg-transformers

Official Repository of "Learning Sequential Descriptors for Sequence-based Visual Place Recognition "
MIT License
39 stars 5 forks source link

summary #11

Closed wpumain closed 1 year ago

wpumain commented 1 year ago

Thank you for your help. I finally understood the source code about CCT+SeqVlad. Although the process was painful, I also learned a lot from your program. Thank you again for your sharing. Here are some questions. May I ask you? 1.https://github.com/vandal-vpr/vg-transformers/blob/c57fca9085a8dfcfd74e21ddea8f6722940de5dd/tvg/datasets/dataset.py#L40 What is the role of 【torch.cat(tuple(triplets_local_indexes))】 ? 2.https://github.com/vandal-vpr/vg-transformers/blob/c57fca9085a8dfcfd74e21ddea8f6722940de5dd/tvg/datasets/dataset.py#L332 If only n sample sequences are selected from 1000 ddatabase sequences, are 1000 too few? What if the 1000 sequences selected are within 25 meters of the q sequence? https://github.com/vandal-vpr/vg-transformers/blob/c57fca9085a8dfcfd74e21ddea8f6722940de5dd/tvg/datasets/dataset.py#L348 In this case, negative samples cannot be selected 3.What is the Melbourne dataset you mentioned in your paper?

ga1i13o commented 1 year ago

Hello.

  1. triplets_local_indexes is a list. Each item in the list corresponds to a triplet, and it contains the index of query, positive, negatives in the triplet. It is quite trivial, in fact the query index is always 0, positive 1, and negatives from 2 to 2+ num_neg, because the triplets are built as (query, positives, neg1, neg2...). The torch.cat is to stack this list into a tensor; however this tensor is not actually useful for anything in the code. Perhaps I should delete this if it is only source of confusion

  2. You are right. I sample 1000 queries, and then 1000 random database sequences among which negatives are chosen. Technically, it can happen that all these 1000 random database sequences are positives and we are left with no negatives. However this never happens in practice since the database contains roughly half a million images, and the 1000 queries that we sample will be sparse from cities all around the world, thus it very very very unlikely that we are left with no possible negatives. However, this procedure has not been introduced in our paper and it is standard practice to use 1000 queries and 1000 database images; it is called 'partial mining' (as opposed to full mining, where you extract features from the entire database to choose negatives). It was introduced in the MSLS paper (https://zaguan.unizar.es/record/106609/files/texto_completo.pdf) , because MSLS is so big it would take forever to extract features for all db. It is further studied in this paper, where it is also compared to other mining techniques: https://arxiv.org/abs/2204.03444 In practice, it turns out that this technique works as well as full mining even though it is much more efficient

  3. Melbourne is one of the cities in the MSLS dataset. for those experiments we only use images from that city. It can be obtained by passing cities=melbourne when instantiating the dataset class

wpumain commented 1 year ago

Think you for your help!