Question about how to prepare the dataset and data split

sheng-eatamath / PromptCAL

Official Implementation of paper: PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery (CVPR'23)

MIT License

40 stars 7 forks source link

Question about how to prepare the dataset and data split #1

Closed GritCS closed 1 year ago

GritCS commented 1 year ago

Hi! Thanks for your remarkable work in NCD, and I'm interested in it. When I reproduce it, I'm confused about the dataset and data split.

what's the difference between imagenet_100, imagenet_100_gcd and imagenet_original_100?
What does the file tree under your /home/sheng/dataset/imagenet-img“ look like, and how are the folders arranged? Here are my file tree under the image datset root, but there is a bug aobut dataset when I run "methods.contrastive_training.contrastive_training_1", as the third figure shown. Thank you very much ! ovo

sheng-eatamath commented 1 year ago

Hello, thanks for your interest in our work. I will specify the use of ImageNet-100 dataset in readme, later. Thanks for this point. For ImageNet-100 dataset, we simply sample 100 classes from the original ImageNet-1k dataset in another folder, and name it as 'imagenet_100_gcd' dataset. This is the dataset we use in our experiments, and the other two are deprecated. The class split we use is in https://github.com/sgvaze/generalized-category-discovery/issues/12. This folder path is actually the training folder of ImageNet-1k dataset (i.e., the subdirectories should be all synset names).

sheng-eatamath commented 1 year ago

And please always remember to delete .ipynb_checkpoints files after inspecting dataset with notebooks.