Open rom1504 opened 3 years ago
I will add information here as I download things. Starting with CC3M, I intend to download it then produce some clip embeddings (using https://github.com/rom1504/clip-retrieval/) / list of clip filtered files
Once it's clear enough, will PR to readme
I downloaded cc3m and cc12m (improving their script a bit in the process)
cc3m can obviously take way less time if using the improved script of cc12m I confirmed in the process that handling million of files is painful and will make it possible to download directly as collection of tars (== webdataset format)
@rom1504 the doc is not available now.
i want to download the data, can you please help me.
I just find download_open_images.txt
file in the repo. how to download using text file ?
having a table with dataset and some information about size/ time to download would be useful https://docs.google.com/document/d/1KCAB-OTHphcCh-4oITIL8r7ih-HuslMKX1Rls_P03CY/edit could serve as complementary information