Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
This pull request introduces support for the Llama 3.2-Vision collection of multimodal large language models (LLMs) within Xinference. These models bring the capability to process both text and image inputs, expanding the potential for diverse applications.
Key Changes:
Expanded Model Support: Adds Llama 3.2-Vision and Llama 3.2-Vision-Instruct models to the list of supported models, accessible through both the transformers and vllm engines.
Vllm Engine Enhancement: Updates the vllm engine to accommodate the specific requirements of the Llama 3.2-Vision models.
Documentation Updates: Improves the documentation to include details about the newly supported models, guiding users on their effective utilization.
This pull request adds support for the Llama 3.2-Vision collection of multimodal LLMs for both the transformers and vllm engines.
Updated llm_family.json and llm_family_modelscope.json to include Llama 3.2-Vision and Llama 3.2-Vision-Instruct model information.
Modified vllm engine's core.py to handle these models.
Enhanced documentation with model reference files to reflect the newly supported built-in models.
This pull request introduces support for the Llama 3.2-Vision collection of multimodal large language models (LLMs) within Xinference. These models bring the capability to process both text and image inputs, expanding the potential for diverse applications.
Key Changes:
This pull request adds support for the Llama 3.2-Vision collection of multimodal LLMs for both the transformers and vllm engines.
llm_family.json
andllm_family_modelscope.json
to include Llama 3.2-Vision and Llama 3.2-Vision-Instruct model information.vllm
engine'score.py
to handle these models.