mlpc-ucsd / BLIVA

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
https://arxiv.org/abs/2308.09936
BSD 3-Clause "New" or "Revised" License
268 stars 27 forks source link

how to get the train file? #25

Open jiinhui opened 3 months ago

jiinhui commented 3 months ago

I can't find the train data files of "BLIVA/bliva/data/llava/bliva_llava_150k.json" and "BLIVA/bliva/data/ocrVQA/cleaned_train_dataset.json". Can you tell me how to download them? Thanks!

gordonhu608 commented 3 months ago

For ocrVQA train data, you can refer to this issue, https://github.com/mlpc-ucsd/BLIVA/issues/12. The paper should mention we used a prompt "OCR tokens: {}" to add OCR tokens directly after the question. As for bliva_llava_150k, I think it's the version of converting llava150k to single-turn chat history. Check the details in paper.

jiinhui commented 3 months ago

For ocrVQA train data, you can refer to this issue, #12. The paper should mention we used a prompt "OCR tokens: {}" to add OCR tokens directly after the question. As for bliva_llava_150k, I think it's the version of converting llava150k to single-turn chat history. Check the details in paper.

I can't find the details about converting llava150k to single-turn chat in your paper. I will try to review InstructBLIP for more details about the dataset.