xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.19k stars 421 forks source link

xinference v0.15.4 launch model 时报错RuntimeError: Failed to launch model, detail: [address=0.0.0.0:56145, pid=108] No available slot found for the model #2455

Closed songleipu123 closed 2 days ago

songleipu123 commented 1 week ago

System Info / 系統信息

ubuntu 22.04 xinference:v0.15.4 Package Version


accelerate 0.34.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aioprometheus 23.12.0 aiosignal 1.3.1 aliyun-python-sdk-core 2.16.0 aliyun-python-sdk-kms 2.16.5 altair 5.4.1 annotated-types 0.7.0 anthropic 0.36.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 argcomplete 3.5.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 24.2.0 audioread 3.0.1 auto_gptq 0.7.1 autoawq 0.2.5 autoawq_kernels 0.0.6 av 13.1.0 bcrypt 4.2.0 beautifulsoup4 4.12.3 bitsandbytes 0.44.1 black 24.10.0 boto3 1.28.64 botocore 1.31.85 cdifflib 1.2.6 certifi 2019.11.28 cffi 1.17.1 chardet 3.0.4 charset-normalizer 3.3.2 chattts 0.1.1 click 8.1.7 cloudpickle 3.0.0 colorama 0.4.6 coloredlogs 15.0.1 conformer 0.3.2 contourpy 1.3.0 controlnet-aux 0.0.7 crcmod 1.7 cryptography 43.0.1 cycler 0.12.1 Cython 3.0.11 datamodel-code-generator 0.26.1 datasets 2.21.0 dbus-python 1.2.16 decorator 5.1.1 DeepCache 0.1.1 diffusers 0.30.3 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 distro-info 0.23+ubuntu1.1 dnspython 2.7.0 ecdsa 0.19.0 editdistance 0.8.1 einops 0.8.0 einx 0.3.0 email_validator 2.2.0 encodec 0.1.1 eva-decord 0.6.1 exceptiongroup 1.2.2 fastapi 0.110.3 ffmpy 0.4.0 filelock 3.15.4 FlagEmbedding 1.2.11 flashinfer 0.1.6+cu124torch2.4 flatbuffers 24.3.25 fonttools 4.54.1 frozendict 2.4.5 frozenlist 1.4.1 fsspec 2024.6.1 funasr 1.1.12 fvcore 0.1.5.post20221221 gdown 5.2.0 gekko 1.2.1 genson 1.3.0 gguf 0.9.1 gradio 4.26.0 gradio_client 0.15.1 h11 0.14.0 hf_transfer 0.1.8 hiredis 3.0.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.2 huggingface-hub 0.24.6 humanfriendly 10.0 hydra-core 1.3.2 HyperPyYAML 1.2.2 idna 2.8 imageio 2.35.1 imageio-ffmpeg 0.5.1 importlib_metadata 8.4.0 importlib_resources 6.4.5 inflect 5.6.2 interegular 0.3.3 iopath 0.1.10 isort 5.13.2 jaconv 0.4.0 jamo 0.4.1 jieba 0.42.1 Jinja2 3.1.4 jiter 0.5.0 jj-pytorchvideo 0.1.5 jmespath 0.10.0 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 kaldiio 2.18.0 kiwisolver 1.4.7 lark 1.2.2 lazy_loader 0.4 libnacl 2.1.0 librosa 0.10.2.post1 lightning 2.4.0 lightning-utilities 0.11.7 litellm 1.49.1 llama_cpp_python 0.2.90 llvmlite 0.43.0 lm-format-enforcer 0.10.6 loguru 0.7.2 loralib 0.1.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.2 mdurl 0.1.2 mistral_common 1.3.4 modelscope 1.17.1 mpmath 1.3.0 msgpack 1.0.8 msgspec 0.18.6 multidict 6.0.5 multiprocess 0.70.16 mypy-extensions 1.0.0 narwhals 1.9.3 natsort 8.4.0 nemo_text_processing 1.0.2 nest-asyncio 1.6.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.68 nvidia-nvtx-cu12 12.1.105 omegaconf 2.3.0 onnxruntime-gpu 1.16.0 openai 1.51.2 opencv-contrib-python-headless 4.10.0.84 opencv-python 4.10.0.84 optimum 1.23.1 orjson 3.10.7 ormsgpack 1.5.0 oss2 2.19.0 outlines 0.0.46 packaging 24.1 pandas 2.2.2 parameterized 0.9.0 partial-json-parser 0.2.1.1.post4 passlib 1.7.4 pathspec 0.12.1 peft 0.13.2 pillow 10.4.0 pip 24.2 platformdirs 4.3.6 plumbum 1.9.0 pooch 1.8.2 portalocker 2.10.1 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 protobuf 5.28.0 psutil 6.0.0 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 17.0.0 pyasn1 0.6.1 pybase16384 0.3.7 pycountry 24.6.1 pycparser 2.22 pycryptodome 3.21.0 pydantic 2.8.2 pydantic_core 2.20.1 pydub 0.25.1 Pygments 2.18.0 PyGObject 3.36.0 pynini 2.1.5 pynndescent 0.5.13 pyparsing 3.1.4 PySocks 1.7.1 python-apt 2.0.1+ubuntu0.20.4.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-jose 3.3.0 python-multipart 0.0.12 pytorch-lightning 2.4.0 pytorch-wpe 0.0.1 pytz 2024.1 PyYAML 6.0.2 pyzmq 26.2.0 quantile-python 1.1 qwen-vl-utils 0.0.8 ray 2.35.0 redis 5.1.1 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 requests-unixsocket 0.2.0 rich 13.9.2 rouge 1.0.1 rpds-py 0.20.0 rpyc 6.0.1 rsa 4.9 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 ruff 0.6.9 s3transfer 0.7.0 sacremoses 0.1.1 safetensors 0.4.4 scikit-image 0.24.0 scikit-learn 1.5.2 scipy 1.14.1 semantic-version 2.10.0 sentence-transformers 3.1.0 sentencepiece 0.2.0 setuptools 75.1.0 sglang 0.3.3.post1 shellingham 1.5.4 six 1.14.0 sniffio 1.3.1 soundfile 0.12.1 soupsieve 2.6 soxr 0.5.0.post1 sse-starlette 2.1.3 starlette 0.37.2 sympy 1.13.2 tabulate 0.9.0 tblib 3.0.0 tensorboardX 2.6.2.2 tensorizer 2.9.0 termcolor 2.5.0 threadpoolctl 3.5.0 tifffile 2024.9.20 tiktoken 0.7.0 timm 1.0.9 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.2 tomlkit 0.12.0 torch 2.4.0 torch-complex 0.4.4 torchaudio 2.4.0 torchmetrics 1.4.3 torchvision 0.19.0 tqdm 4.66.5 transformers 4.44.2 transformers-stream-generator 0.0.5 triton 3.0.0 typer 0.11.1 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 unattended-upgrades 0.1 urllib3 2.0.7 uvicorn 0.30.6 uvloop 0.20.0 vector-quantize-pytorch 1.18.1 vllm 0.6.0 vllm-flash-attn 2.6.1 vocos 0.1.0 watchfiles 0.24.0 websockets 11.0.3 WeTextProcessing 1.0.3 wget 3.2 wheel 0.34.2 wrapt 1.16.0 xformers 0.0.27.post2 xinference 0.15.4 xoscar 0.3.3 xxhash 3.5.0 yacs 0.1.8 yarl 1.9.9 zipp 3.20.1 zmq 0.0.0 zstandard 0.23.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

xinference, version 0.15.4

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d --name xinference -e XINFERENCE_MODEL_SRC=modelscope -e HF_ENDPOINT=https://hf-mirror.com -p 9998:9997 --gpus device=1 --shm-size=128g xprobe/xinference xinference-local -H 0.0.0.0 --log-level debug

xinference launch --model-name bge-reranker-v2-m3 --model-type rerank xinference launch --model-name ChatTTS --model-type audio

Reproduction / 复现过程

1.尝试过换就版本v0.15.3和v0.15.2都不行 2.尝试过将sentence-transformers降到3.1.0和3.1.1也都不行

xinference launch --model-name bge-reranker-v2-m3 --model-type rerank

Traceback (most recent call last): File "/usr/local/bin/xinference", line 8, in sys.exit(cli()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 901, in model_launch model_uid = client.launch_model( File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 940, in launch_model raise RuntimeError( RuntimeError: Failed to launch model, detail: [address=0.0.0.0:56145, pid=108] No available slot found for the model

Expected behavior / 期待表现

可以成功launch

qinxuye commented 2 days ago

显卡被别的模型占用了,要么停止启动。

对于 llm 和 embedding/rerank 一起使用,可以先加载 embeddding/rerank 再加载 LLM。

songleipu123 commented 21 hours ago

显卡被别的模型占用了,要么停止启动。

对于 llm 和 embedding/rerank 一起使用,可以先加载 embeddding/rerank 再加载 LLM。

确实,我停了ollama,然后重启了xinference,再下载就好了,必须先启动xinference才行,感谢感谢