Open SuperPat45 opened 5 days ago
Since yesterday vllm has internVL2 support. :-)
I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also https://github.com/ggerganov/llama.cpp/issues/9440 ).
I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples:
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py
Add the new Multi-Modal model of mistral AI: pixtral-12b:
https://huggingface.co/mistral-community/pixtral-12b-240910
It supports image encoder, can it also be added to the image generator API as an alternative to Stable Diffusion?