jhgan00 / image-retrieval-transformers

(Unofficial) PyTorch implementation of Training Vision Transformers for Image Retrieval(El-Nouby, Alaaeldin, et al. 2021).
44 stars 6 forks source link

Performance on Cars196 dataset #17

Open m990130 opened 1 year ago

m990130 commented 1 year ago

Hi, thanks for this great repo! I've tried out a few runs, and they work nicely.

I've also tested this method on the Cars196 dataset (with the same setup as CUB, I also wrote a dataset file for it, but almost the same). However, it performed pretty badly, with R@1=52%.

As it is one of the most evaluated datasets in deep metric learning community, I wonder if you have any idea why this is the case. Because usually if the methods work on CUB and SOP, they at least perform comparably on Cars196, and this is not the case. Thanks in advance.

jhgan00 commented 1 year ago

In my experience, the sampling strategy had a huge impact on performance. I'm not sure, but it might be helpful to apply other sampling strategies such as m per class sampling(try something like python main.py --m 4 ...). Increasing the batch size can also be helpful ...

Can you share me your dataset code and training scripts? Maybe we can find some hyperparameters that work fine for Cars.

Thanks.

m990130 commented 1 year ago

Hi, thanks for the quick response. Because the results are already quite good (compared to those results achieved by conventional ConvNet), I did not pay much attention to other hyper params. Indeed, batch-size and m per class are crucial.