jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

Could only download 400k images #54

Closed gsrivas4 closed 3 years ago

gsrivas4 commented 4 years ago

I am trying to use your script to download conceptual captions dataset. I was able to download only 400k images from the training set, instead of the 3M images in the training set. I have run your script 5 times to download the images which might be coming from unreliable servers. Apparently, there are a lot of images for which the links do not seem to work anymore. If you have the images downloaded somewhere, would it be possible for you to share the dataset?

menggehe commented 4 years ago

I encounter the same problem with you. I download the VCR dataset and I encounter the error. FileNotFoundError: [Errno 2] No such file or directory: './data/vcr/vcr1images/movieclips_The_Jackal/AA4zkmfbFD0@0.json' Do you have the complete dataset? Thank you.

gsrivas4 commented 4 years ago

@menggehe No I could not download the whole dataset. I am using only those 400k images from training. It would be good to get the complete dataset, though.

jackroos commented 3 years ago

I am sorry that I couldn't find a way to share such a large dataset, for now. @gsrivas4

zhangdabusy commented 3 years ago

@menggehe No I could not download the whole dataset. I am using only those 400k images from training. It would be good to get the complete dataset, though. I got it. Reply me, and I send it for you.

zhangdabusy commented 3 years ago

I am sorry that I couldn't find a way to share such a large dataset, for now. @gsrivas4

I am trying to use your script to download conceptual captions dataset. I was able to download only 400k images from the training set, instead of the 3M images in the training set. I have run your script 5 times to download the images which might be coming from unreliable servers. Apparently, there are a lot of images for which the links do not seem to work anymore. If you have the images downloaded somewhere, would it be possible for you to share the dataset?

why do you need 3M data? For what