mckaywrigley / chatbot-ui

Come join the best place on the internet to learn AI skills. Use code "chatbotui" for an extra 20% off.
https://JoinTakeoff.com
MIT License
28.83k stars 8.04k forks source link

Only 1 image per convo is working in Vision – subsequently added images gives error #1731

Open siavashvj opened 6 months ago

siavashvj commented 6 months ago

When uploading an image to a vision model (Opus/Vision/4o) in a convo the image is added to the context just fine. But if subsequently you upload another image to the same convo no response is given.

Steps to reprodouce:

  1. Start new convo with a vision model (Opus / gpt4-vision / gtp4o)
  2. Upload image and ask to describe
  3. Upload new image and ask to describe.
rossman22590 commented 6 months ago

same!

faraday commented 5 months ago

Gemini Vision model is limited as a single-message model. Gemini Flash supports conversational multi-message scenario though.

Recently, the scenario you mention is also fixed, using the last image message. So effectively, when using Gemini Vision, it can't account for every image it has seen and come up with a high-level answer on all images provided. However, it's able to keep on responding right now.

Other than Gemini limitation, I just tested Anthropic side with Haiku and it's working fine. Anthropic model also (as with Gemini Flash) can account for all the images it's seen.

NuerSir commented 5 months ago

My understanding is that Gemini Vision is just a transitional model, while Gemini 1.5 Pro and Gemini 1.5 Flash already support multi-turn conversations with images.

I have made some temporary modifications to enable the use of the new models and support multi-turn image conversations.

ps: This modification is just a quick fix to achieve the goal. I hope the project author can optimize it to be simpler.

image image image
faraday commented 5 months ago

This is already merged. It's working in main.

faraday commented 5 months ago

@NuerSir That was what I was trying to say. I've modified the message handling for Gemini Vision and upgraded the library to comply with this change. Main branch current state is working just fine.

faraday commented 5 months ago

Please check with current, up-to-date main branch.