mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.36k stars 1.78k forks source link

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

Open SuperPat45 opened 5 days ago

SuperPat45 commented 5 days ago

Add the new Multi-Modal model of mistral AI: pixtral-12b:

https://huggingface.co/mistral-community/pixtral-12b-240910

It supports image encoder, can it also be added to the image generator API as an alternative to Stable Diffusion?

AlexM4H commented 4 days ago

Since yesterday vllm has internVL2 support. :-)

vllm-project/vllm/releases/tag/v0.6.1

mudler commented 4 days ago

I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also https://github.com/ggerganov/llama.cpp/issues/9440 ).

I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples:

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py