microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.58k stars 201 forks source link

Question about the collection of "Instruction-following data" #57

Open CinKKKyo opened 8 months ago

CinKKKyo commented 8 months ago

There are some questions when I am working on the reimplementation of LLaVA-Med:

  1. In the "GPT-4 Assisted Data Generation -- Generate visual instruct tuning conversations using GPT-4" process, I saw the image caption file named "llava_med_instruct_fig_captions.json" was used, but it was not found in the corresponding file, I would like to know how to organize the image caption file in order to customize our own data, could you provide it?

  2. Also in the "GPT-4 Assisted Data Generation" process, I would like to know what is the meaning of "inline mentions(IM)"? Is that the sentence that mentioned the corresponding figure in related papers?

thedaffodil commented 3 months ago

could you find any answer about your questions?

qm-intel commented 3 months ago

@CinKKKyo Could you find an answer for your question 1?

Regarding your question: The inline mentions are those sentences in the article that refer to the figure.