haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.53k stars 2.27k forks source link

[Question] The codebase for data generation #125

Open DL-ML opened 1 year ago

DL-ML commented 1 year ago

Question

Hi. Thanks for the public codebase. Which part is the complete codebase for data generation? And how to get the five associated captions for an image, as the input of GPT-4?

haotian-liu commented 1 year ago

Hi, thank you for your interest in our work.

Please see here for the prompts and few-shot examples that we use for data generation.

The current version, we use the COCO 2014 dataset for associated captions, images, and bounding boxes.

Thanks.

leimeng86 commented 1 year ago

Are you using COCO 2014 training split?