mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
257 stars 26 forks source link

Training datasets #10

Closed UnderTheMangoTree closed 11 months ago

UnderTheMangoTree commented 11 months ago

Thanks for your work! How can I get the "blip_laion_cc_sbu_558k.json" and "bliva_llava_150k.json"? the original llava_150k.json doesn't fit the dataload module, such as "ann["question"]".

gordonhu608 commented 11 months ago

Thank you for your interest in our work. We acknowledged LLaVA and used their pre-training dataset here: https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/blip_laion_cc_sbu_558k.json. As for the lava 150k, it has multiple nested questions and answers in one data. We simply unnested or flatten the data into one question, answer pair per data point.

UnderTheMangoTree commented 11 months ago

okay, thanks for your help!😁