xoscar.errors.ServerClosed: [address=0.0.0.0:13276, pid=39] Remote server unixsocket:///20447232 closed

System Info / 系統信息

SERVER:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz PRETTY_NAME:"Debian GNU/Linux 11 (bullseye)" python:3.11.5 conda:23.10.0 torch:2.4.1+cpu

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[X] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinference:0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

docker run -it -e XINFERENCE_MODEL_SRC=modelscope -p 9996:9997 -v ./xinference:/root/.xinference --name xinference-cpu image_name xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

set in web: launch_model:qwen2.5-instruct model_format:ggufv2 model_size:7 quantization:q4_k_m N GPU layers:1 replica:1

Expected behavior / 期待表现

What I'm trying to know is, is this a tool configuration issue or an instruction set support issue or something? When I use a different server to configure the same quantization model with llama.cpp, I don't get the same error. If needed, I can provide the configuration information that is working properly. Thank you.

xorbitsai / inference