Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
System Info / 系統信息
CUDA Version: 12.0 Python Version: 3.10
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xinference v0.15.4
The command used to start Xinference / 用以启动 xinference 的命令
nohup xinference-local -H 192.22.139.188 -p 59997 &
Reproduction / 复现过程
XInference 加载 Qwen2- 72B-Instruct 模型,使用 Dify (version 0.10.0)Agent 模式调用该大模型报错。
Expected behavior / 期待表现
期待解决。