xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.49k stars 446 forks source link

xoscar.errors.ServerClosed: [address=0.0.0.0:13276, pid=39] Remote server unixsocket:///20447232 closed #2583

Open erliang-sf opened 6 hours ago

erliang-sf commented 6 hours ago

System Info / 系統信息

SERVER:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz PRETTY_NAME:"Debian GNU/Linux 11 (bullseye)" python:3.11.5 conda:23.10.0 torch:2.4.1+cpu

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference:0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

docker run -it -e XINFERENCE_MODEL_SRC=modelscope -p 9996:9997 -v ./xinference:/root/.xinference --name xinference-cpu image_name xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

set in web: launch_model:qwen2.5-instruct model_format:ggufv2 model_size:7 quantization:q4_k_m N GPU layers:1 replica:1

Expected behavior / 期待表现

What I'm trying to know is, is this a tool configuration issue or an instruction set support issue or something? When I use a different server to configure the same quantization model with llama.cpp, I don't get the same error. If needed, I can provide the configuration information that is working properly. Thank you.

erliang-sf commented 6 hours ago

to add:llama_cpp_python==0.3.1,and the container running successfully on another server is configured with the same version of python, llama_cpp_python, conda, torch