mlfoundations / datacomp

DataComp: In search of the next generation of multimodal datasets
http://datacomp.ai/
Other
588 stars 49 forks source link

train/test splits for downstream tasks #37

Closed bluer555 closed 11 months ago

bluer555 commented 11 months ago

Hello!

For some of the downstream datasets, the train-test splitting is unknown. Could you please share how the train and test subsets are split so that we can avoid using test images? The tasks with unknown train-test split are: Caltech-101 [41], DTD [26], EuroSAT [57, 147], KITTI distance [44, 147], RESISC45 [23, 147], Dollar Street [115], GeoDE [107]

Thanks a lot!

gabrielilharco commented 11 months ago

Hey @bluer555, you can find all of the test sets here https://huggingface.co/djghosh. E.g. for Caltech-101 see https://huggingface.co/datasets/djghosh/wds_vtab-caltech101_test. The datasets are in the webdataset format.