Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
xinference v0.15.4 launch model 时报错RuntimeError: Failed to launch model, detail: [address=, pid=108] No available slot found for the model #2455

Closed songleipu123 closed 2 days ago

songleipu123 commented 1 week ago

System Info / 系統信息

ubuntu 22.04
xinference:v0.15.4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference, version 0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d --name xinference -e XINFERENCE_MODEL_SRC=modelscope -e HF_ENDPOINT=https://hf-mirror.com -p 9998:9997 --gpus device=1 --shm-size=128g xprobe/xinference xinference-local -H --log-level debug

xinference launch --model-name bge-reranker-v2-m3 --model-type rerank xinference launch --model-name ChatTTS --model-type audio

Reproduction / 复现过程

1.尝试过换就版本v0.15.3和v0.15.2都不行 2.尝试过将sentence-transformers降到3.1.0和3.1.1也都不行

xinference launch --model-name bge-reranker-v2-m3 --model-type rerank

Traceback (most recent call last): File "/usr/local/bin/xinference", line 8, in sys.exit(cli()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 901, in model_launch model_uid = client.launch_model( File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 940, in launch_model raise RuntimeError( RuntimeError: Failed to launch model, detail: [address=, pid=108] No available slot found for the model

Expected behavior / 期待表现


qinxuye commented 2 days ago


对于 llm 和 embedding/rerank 一起使用,可以先加载 embeddding/rerank 再加载 LLM。

songleipu123 commented 21 hours ago


