salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.86k stars 648 forks source link

Details about Visual-genome Dataset #79

Closed FingerRec closed 2 years ago

FingerRec commented 2 years ago

Thanks for this good work!

I find there are two image parts in https://visualgenome.org/api/v0/api_home.html.

But the vg_caption.json not indicated how to process these two subset.

Can you kindly provide more details about processing vg dataset?

BTW, you use V1.0 or V1.1 for VG?

woctezuma commented 2 years ago

Maybe related:

FingerRec commented 2 years ago

Solved. thanks for your timely feedback