Open xinsir6 opened 11 months ago
You might try putting two images into one by separating them with a horizontal line and including that detail in your prompt. I haven't tested this method extensively, but it seemed to work quite well in a quick trial I did:
If this works well enough for you, you could adjust the scripts to accept two images which then get merged in this way. I might create a quick CLI demo.
good idea, I will try it, the only problem is that the method requires two images to be resized into same width/height
good idea, I will try it, the only problem is that the method requires two images to be resized into same width/height
I've made a simple proof of concept for you based on cli.py:
https://github.com/mapluisch/LLaVA-CLI-with-multiple-images
It doesn't resize (or truncate / crop) the images, just concatenates them vertically.
good idea, I will try it, the only problem is that the method requires two images to be resized into same width/height
I've made a simple proof of concept for you based on cli.py:
https://github.com/mapluisch/LLaVA-CLI-with-multiple-images
It doesn't resize (or truncate / crop) the images, just concatenates them vertically.
I try to ask the model as the way you request, but the model refuse to reply, is there any way to solve it ?
You should play around with different prompts and temperatures. I'm using 4-bit quantization on the 13b model, and this prompt works:
Analyze the two images and tell me which one is better and why
@mapluisch Hello! It is true that you don’t need to resize the two images when you just concatenate them vertically, but I think the model does resize them to fit when feeding them to the clip image encoder.
Question
I have a problem when use llava to process multi images in the same time, such as, give the model 2 or more images, and ask it to answer questions about the images, like which one do you like better? The web demo and chat can't solve this problem, so could you provide a special scripts to do this?