mdabbah / COOD_benchmarking

MIT License
15 stars 0 forks source link

The filtered subset of ImageNet-21k #2

Open liudakai2 opened 1 year ago

liudakai2 commented 1 year ago

Hi guys, congrats on your ICLR23-top25% paper. The 'severity' benchmark in your work is impressing. However, up to now I can only find your released dummy dataset in this repo. If you can kindly release the filtered subset of ImageNet-21k, as used in your paper, it would be a great contribution to the OOD detection community.

charchit7 commented 1 year ago

Hi Guys, I agree with @liudakai2 please do share as soon as possible. It will help us a lot.

mdabbah commented 1 year ago

Hi guys 👋 i've added instructions for downloading and extracting the dataset to the readme file (expected size is around 200GB)

please let us know if you face any problems 😄

liudakai2 commented 1 year ago

Hi guys 👋 i've added instructions for downloading and extracting the dataset to the readme file (expected size is around 200GB)

please let us know if you face any problems 😄

Excited to hear that! I will have a try immediately.

charchit7 commented 1 year ago

Sure, thank you!

ZhiyangLiang commented 10 months ago

Hi guys, would you mind providing the filtered version of ImageNet-21k-val? As the filtered version of ImageNet-21k-train is still so large, I believe the filtered version of ImageNet-21k-val would be great help for the OOD detection community as well~

mdabbah commented 10 months ago

Hi 👋 we provide the imagenet-21k filtered version which is around 200GB (see instructions on how to obtain it here )

as we discussed in our work the severities are dependent on the model and its confidence function, and so an eval set for one model is different from the other,

you'd need the whole set (200GB which is split between estimation and validation) in order to estimate the severity per class with the estimation set, and then use the validation set to calculate the OOD performance on the desired severity.

hope this helps

ZhiyangLiang commented 10 months ago

Thanks for your reply! However, what I want is a smaller dataset having 21k classes as an surrogate dataset for OOD detection(instead of training or evaluating on ImageNet-21-k directly). If you have already provided the filtered version of the ImageNet-21-k-val, could you tell you which part it is in the 200GB dataset(sorry, but I'm unable to open the hyperlink you mentioned above)? Thanks a lot~

IdoGalil commented 9 months ago

Hey @ZhiyangLiang , Due to the nature of severity levels, the entirety of the (filtered) dataset we've provided (200GB) is necessary for evaluating new models.

The validation set (or test set, as we call it in the paper) part of the dataset, which is 25% of the dataset, is only used after the classes are divided into the different severity groups. You can find more details about it in the paper if you'd like, but the short answer is that you need the entire 200GB.

However, you could use our framework to benchmark your models with a different dataset (not our filtered ImageNet-21k). For example, you could download only a certain part of ImageNet21k (or any other OOD dataset for that purpose), and use it with our code to evaluate your models. For an example of how to load custom datasets, check our notebook, under "The basics". Check how "dummy_ood_dataset_info" was defined and used: https://github.com/mdabbah/COOD_benchmarking/blob/main/example.ipynb

khawar-islam commented 3 months ago

Dear @mdabbah The link is not working, please check it and let us know

mdabbah commented 3 months ago

Hi @khawar-islam 👋 Thank you for letting us know about the broken link!

fixed the broken link to the dataset please follow the instructions here here )to download it.

please don't hesitate to write to us if you face any further issues.

khawar-islam commented 3 months ago

Hello @mdabbah it is very hard to download ImageNet-21k dataset as it is extremely big and takes too much time. Please help me out and if you can give some it would be helpful for me

IdoGalil commented 3 months ago

Hello @mdabbah it is very hard to download ImageNet-21k dataset as it is extremely big and takes too much time. Please help me out and if you can give some it would be helpful for me

Hey, the git contains a fixed link to the filtered version only of ImageNet-21k that is much smaller than the entire dataset. Here's the link, available on the git's readme: https://technionmail-my.sharepoint.com/personal/mdabbah_alumni_technion_ac_il/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fmdabbah%5Falumni%5Ftechnion%5Fac%5Fil%2FDocuments%2FImageNet%5F12K%5Fdataset%2Fdataset%5Fparts&ga=1

For more details about how to extract the filtered dataset, please see "Instructions for downloading and extracting our filtered version of ImageNet-21k" on the readme.