turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.22k stars 238 forks source link

Support for multimodal models #285

Open ParisNeo opened 5 months ago

ParisNeo commented 5 months ago

Hi there. Thank you for This cool binding. I am using it on lollms and it is the fastest one for local models without services on par with ollama but with way less latency.

I just wonder if there is support for multimodal models like llava. That would be great to have an example of code. I can add it in the lollms multimodal workflow making it really the coolest.

For now it is just for text generation. image