matatonic / openedai-vision

An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
GNU Affero General Public License v3.0
204 stars 17 forks source link

Support for Qwen2-VL #15

Closed Backendmagier closed 2 months ago

Backendmagier commented 2 months ago

i think Qwen2-VL is not supported yet. Its currently SOTA. In my tests it has performed brilliant.

here is the link https://github.com/QwenLM/Qwen2-VL

Would love to have that!

Many Thanks in Advance!

saket424 commented 2 months ago

Alibaba Launches Qwen2-VL, Surpasses GPT-4o & Claude 3.5 Sonnet https://analyticsindiamag.com/ai-news-updates/alibaba-launches-qwen2-vl-surpasses-gpt-4o-claude-3-5-sonnet/

matatonic commented 2 months ago

There is a qwen2-vl branch available if anyone is interested in testing it, no pre-build image however, so you would need to build it yourself. The GPTQ version of the model are not working yet (seems to generate tokens, but nothing comes out), but the main and AWQ models work for me. Video probably doesn't work without some hackery, and maybe not at all (untested so far).

matatonic commented 2 months ago

data: uri's may not work yet, just FYI. urls seem to work fine.

https://github.com/QwenLM/Qwen2-VL/issues/202

Update: worked around for now.