Closed prajwalgatti closed 2 years ago
Hi @prajwalgatti ,
Thank you for your interest.
In this case, you could download the index files only, at:
path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/data/imdb/cc
The "image_name" in the index files are the IDs of CC. Thank you.
Hello @zyang-ur, and all
Thanks for this work, it is quite interesting.
I'm trying to obtain the OCR-CC dataset but due to my constraints, I can't download the 1.7TB dataset. However, I have the CC dataset and it would be possible for me to obtain the subset of images that are in OCR-CC.
Could you please share the image IDs of CC that were used to construct OCR-CC?
Thanks in advance!