yuecao0119 / MMInstruct

The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
Apache License 2.0
25 stars 0 forks source link

dataset bug #1

Open bobo0810 opened 2 weeks ago

bobo0810 commented 2 weeks ago

@yuecao0119 hello , caption_en.json image and caption misalignment, and there are some Chinese cases

yuecao0119 commented 2 weeks ago

Thanks for your question.

But can you tell me which images don't match? I tried to access all the image paths in the json file and found no missing images.

It is worth noting that the path in our json file is similar to "image": "image_comparison/0001/00000053.jpg". You may need to modify it according to the actual path. For example, "home/images/"+"image_comparison/0001/00000053.jpg".