mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
257 stars 26 forks source link

Some question about train dataset? #14

Closed shipengai closed 10 months ago

shipengai commented 10 months ago

As paper says,“Instead, it leverages a more compact 0.5M pre-training caption data following llava”。 It means that “In the first pretrain stage, the train dataset is only blip_laion_cc_sbu_558k.json?”

gordonhu608 commented 10 months ago

Yes, this is amount of data used for initial alignment.