Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
System Info / 系統信息
Jetson AGX Orin 64GB jetpack 6.0
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.16.1
The command used to start Xinference / 用以启动 xinference 的命令
X_INFERENCE_HOME=/mnt/data/x_inference_data/data XINFERENCE_MODEL_SRC=modelscope xinference-local --host 127.0.0.1 --port 9997
Reproduction / 复现过程
Expected behavior / 期待表现
希望调用gpu,提高对话速度