haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.49k stars 2.27k forks source link

[Question] Text only training? #1467

Open SamuelSchmidgall opened 7 months ago

SamuelSchmidgall commented 7 months ago

Question

How do I format the data to do text only training??

ZzoomD commented 5 months ago

hi, bro, have you solved training using only text?

Tree-Shu-Zhao commented 3 days ago

Hi! May I ask if the performance dropped after fine-tuning on a text-only dataset? I have multimodal/text mixed samples. After fine-tuning, the performance of text-only queries significantly dropped. I'd like to know if you got similar results.

SamuelSchmidgall commented 3 days ago

Hey, @ZzoomD I ended up switching to prismatic-vlm which supports the mixed training.

@Tree-Shu-Zhao I did see improvements on both my text and vision problems, the performance did not get worse