haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.28k stars 2.12k forks source link

Some images of ocr vqa data in llava_v1_5_mix665k.json do not exist! #1618

Open Vicent0205 opened 1 month ago

Vicent0205 commented 1 month ago

Question

When I conduct finetuning using the mix665k.json file. I find that there are some images for ocr vqa do not exist! I find that there are 80,000 ocr_vqa data in mix665k file, while images of 355 data does not exist using the given download script.

Georgefwt commented 1 month ago

You can download the whole ocr_vqa here, I tested it, and worked.

https://huggingface.co/datasets/ej2/llava-ocr-vqa

dacian7 commented 1 month ago

Georgefwt

Thank you!

Georgefwt commented 1 month ago

Additionally, in this dataset, 1437717772.jpg seems to be corrupted and needs to be downloaded again:

wget http://ecx.images-amazon.com/images/I/51YTH4k3fUL.jpg
cp 51YTH4k3fUL.jpg playground/data/ocr_vqa/images/1437717772.jpg
divisionblur commented 3 weeks ago

Thank you

Jike338 commented 1 week ago

thank you immensely helpful