uvavision / RerankingTransformer

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers
GNU General Public License v3.0
123 stars 23 forks source link

Using RRT for different datasets #11

Closed mhmd-mst closed 1 year ago

mhmd-mst commented 1 year ago

Hello, I am working on datasets for visual geolocalization and wanted to use RRT on them, I want to ask you if it is possible to use different descriptors than delg? And the variable src_positions means keypoint? also src_masks means the attention mask? And if so what is the variable attention and why wasnt it used?

fwtan commented 1 year ago

Hello, I am working on datasets for visual geolocalization and wanted to use RRT on them, I want to ask you if it is possible to use different descriptors than delg? And the variable src_positions means keypoint? also src_masks means the attention mask? And if so what is the variable attention and why wasnt it used?

Hi,

Yes, it is possible to use other descriptors, especially in-domain descriptors, i.e. pretrained for visual geolocalization in your case.

By src_positions and src_masks, I would guess you're referring to the variables presented in this line?:https://github.com/uvavision/RerankingTransformer/blob/c198e7e351d49a13260392b56df6b171653bb393/RRT_GLD/models/matcher.py#L33 Here, src_positions are the x, y coordinates of each descriptor. Also, as each image may have different numbers of descriptors, when the training and inference involve mini-batches, we may need to inform the model how many numbers of descriptors for each image should be attended to. That's why we provide variables like src_masks. You can check how these variables are created from this function: https://github.com/uvavision/RerankingTransformer/blob/c198e7e351d49a13260392b56df6b171653bb393/RRT_GLD/utils/data/dataset.py#L32