jhgan00 / image-retrieval-transformers

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).
45 stars 7 forks source link

Particular Object Retrieval #18

Open mennanyang opened 1 year ago

mennanyang commented 1 year ago

You have implemented Category-level Retrieval. Have you considered implementing a Particular Object Retrieval? Thank you!

jhgan00 commented 1 year ago

Hello.

Currently, we don't have any plans to implement instance-level image retrieval.

Thank you.

vpvsankar commented 1 year ago

@jhgan00 What should I do to build an instance-level retrieval? Is it possible to build it with your code?

jhgan00 commented 1 year ago

The most important thing is to implement the process of preparing and loading the dataset. Currently, the __getitem__ method returns a tuple of one image and its label. However, the author mentions that for the particular object retrieval task, they sample 2,000 positive pairs and 22,000 negative candidates through hard-negative mining at each epoch. Each batch consists of five tuples, which include one anchor, one positive, and five negatives. I believe this part should be implemented in the dataset.

We report results for image sizes of 224×224 and 384×384. For finetuning, each batch consists of 5 tuples of (1 anchor, 1 positive, 5 negatives). For each epoch, we randomly select 2,000 positive pairs and 22,000 negative candidates (using hard-negative mining). We use the default hyper-parameters of Radenovic et al. ´ (2018b): the models are optimized using Adam (Kingma & Ba, 2015) with small learning rate of 5.10−7 and weight decay of 10−6. The contrastive loss margin is set to β = 0.85. The models are finetuned for 100 epochs.