salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.86k stars 648 forks source link

Vg_caption #27

Closed hhzb123 closed 2 years ago

hhzb123 commented 2 years ago

In pretraing status,we use visual genome as part of our training data.But I cannot fing vg_caption.json.Could you tell me where can I download it?

LiJunnan1992 commented 2 years ago

Hi, you can download it here: https://storage.googleapis.com/sfr-vision-language-research/datasets/vg_caption.json Note that you need to change the image paths in the json file to your own local paths.

Thanks!

hhzb123 commented 2 years ago

Does it come from vg region descriptions?

LiJunnan1992 commented 2 years ago

Yes!

hhzb123 commented 2 years ago

There are 821k texts in vg_caption.But It's calculated 769k in your papr.So I'd like to know if you use them all for pretaining?

LiJunnan1992 commented 2 years ago

Yes all texts are used for pretraining, I will modify the paper accordingly, thanks for pointing it out!