turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.69k stars 283 forks source link

[REQUEST] Support for a Qwen based vision model #672

Open TyraVex opened 1 week ago

TyraVex commented 1 week ago

Problem

Hello,

I'm very pleased to see exllama getting vision capabilities for the first time with Pixtral!

You hinted at supporting new models in the release notes. What models are you hopping to support?

Solution

If I may suggest a few ideas, Qwen based vision models are the SOTA as of writing. Support for Qwen2-VL and/or NVML-D could be a huge step forward

Alternatives

No response

Explanation

Support for either of these beasts https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct https://huggingface.co/nvidia/NVLM-D-72B

Examples

No response

Additional context

Forgot to mention that the Qwen VL model family offers multiple sizes (2B, 7B, 72B), which could be convenient for the GPU poor community.

Acknowledgements

turboderp commented 4 days ago

Qwen2-VL is supported (images at least, not video just yet) on the dev branch. NVLM-D looks interesting, and I might consider it next, once Qwen2-VL support is complete.

TyraVex commented 4 days ago

It's chrismas every day here Thank you so much, this is so useful I have plently of projects that will rely on this feature :)