Closed thealmightygrant closed 4 months ago
Hi y'all, I looked into this a bit more. I am using open-webui, and I believe what is occurring is that multiple images are being passed into the model as part of providing context from earlier conversations. I am not 100% sure about this though. If that is the case, then this condition is being hit every time and it is nothing more than that there is no support for multi-image inference right now in vLLM.
Hey @thealmightygrant! If you context does have multiple images (i.e, multiple image_url
) in the messages
, then this is indeed not supported currently by vLLM. (and in fact, I believe all the vision language models we support on vLLM today do not support multi-image themselves either). Thanks!
Closing, as I confirmed this is what is occurring. Please reference #4194 for ongoing refactoring of visual language models.
Oh btw, phi 3 vision is working great for us! Thanks for the hard work @ywang96!
Oh btw, phi 3 vision is working great for us! Thanks for the hard work @ywang96!
Very glad to hear that! :)
Your current environment
🐛 Describe the bug
Hi y'all, I am accessing Phi 3 Vision via the API's
v1/chat/completions
endpoint. It seems that if the connection remains open, I am getting the following error after making two requests with two different images:If I go ahead and ensure that each request is made on a new connection, then this error does not occur. My only guess is that this line should have an await added to it. I will open a PR referencing this with that change, let me know when you can if you all agree.