haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.93k stars 2.07k forks source link

[Question] discussion about multiple images? #1072

Open jacekpoplawski opened 6 months ago

jacekpoplawski commented 6 months ago

Question

hello, I just started to use llava 1.6 34B few days ago and it's fantastic,

to me it's much better than ChatGPT, because ChatGPT refuses to load most photos, I am a photographer and I am trying to discuss my photos with llm, to ask how to improve composition or postprocessing and llava is able to understand details of each photo, this is awesome

my question: is it possible to discuss about multiple images? for example, I want to show unprocessed photo to the model and then show processed photo and ask what llm thinks, or show few photos and ask for best one

I was able to use llava 34b gguf by loading it into llama-cpp (llava-cli and server), I am not able to use unquantized model because I have only 24GB VRAM

pseudotensor commented 5 months ago

I don't think it's supported.

But multiple images of order 10-20 like claude-3 (20), gpt-4-vision (10), or gemini-pro-vision (16) is crucial in order to do (e.g. ) document Q/A with images as various documents not able to be handled as just text.

pseudotensor commented 5 months ago

@haotian-liu Any plans, or can code be changed for fine-tuning?