microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.44k stars 174 forks source link

biomedical concept alignment data #85

Open hddbang opened 1 month ago

hddbang commented 1 month ago

In Chapter 3 (Biomedical Concept Alignment Data)of the paper <LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day>, it is mentioned that "We sample 600K image-text pairs from PMC-15M".

However, the actual data (llava_med_alignment_500k.json) provided in the GitHub repository only contains 500k pairs. Where did the remaining 100k pairs go?

alyakin314 commented 1 month ago

furthermore, there are only

Total entries: 467710
Present images: 467336
Missing images: 374

in the "500k" file. and as demonstrated above, the script used to download fails to fetch a portion of the articles which results in further missing images. and the script is extremely slow even with parallelizing to 200 threads : ) took like 4 days.