lenscloth / RKD

Official pytorch Implementation of Relational Knowledge Distillation, CVPR 2019
395 stars 49 forks source link

test_set and train_set's lengths mismatch with those based on README from the dataset #20

Closed alexzhang0825 closed 3 years ago

alexzhang0825 commented 3 years ago

Hello. When I ran the code it shows that there are 5864 images used for training and 5924 used for testing. However, based on the train_test_split.txt provided in the README file in the CUB200 dataset, it shows that there are supposed to be 5994 used for training and 5794 used for testing. I was wondering if you know what caused this inconsistency, and if so, do you mind pointing out which specific 130 images you swapped from testing to training?

Thanks a lot

lenscloth commented 3 years ago

First of all, you need to understand the training and test splits used in the field of deep metric learning (DML). Unlike classification where the images of test set is consist of "seen classes", The models of DML is evaluated on the task of image retrieval for the "unseen classes".

Therefore, train/test split of CUB200 for DML is different with that of CUB200 for classification. First 98 classes are used for training, and the last 98 classes are used for test.

You may read the experiments section of the following paper for the details. https://cvgl.stanford.edu/papers/song_cvpr16.pdf

alexzhang0825 commented 3 years ago

I see. I'm still a bit confused, however. You said that the first 98 classes are for training while the last 98 are for testing, but what about the remaining 4 classes, are they completely ignored? Because the number of test and train images shown in the code output sum up to exactly how many images there are in the dataset. Sorry if this questions seems dumb.

lenscloth commented 3 years ago

Sorry for my typo, I was confused CUB200 with Cars-196

For CUB 200, among 200 classes first 100 classes are used for training, and the last 100 classes are used for test. For Cars 196, among 196 classes first 98 classes are used for training, and the last 98 classes are used for test.