xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.77k stars 375 forks source link

【BUG】安裝一個模型時提示:Server error: 503 - [address=0.0.0.0:30891, pid=146681] No available slot found for the model #1586

Open vball opened 3 months ago

vball commented 3 months ago

一直無法正常使用任何模型

qinxuye commented 3 months ago

这个报错是指显卡已经被占用。目前一个显卡只能分配一个 LLM。

是不是最早启动失败,导致已经被占用。请重启,换更小的 size 再试下。

vball commented 3 months ago

image 我重啓了下,使用UI面板再操作了下,第一次超時無響應,第二次又出現沒有slot。 @qinxuye

qinxuye commented 3 months ago

Running models 里也没有模型?

vball commented 3 months ago

Running models 里也没有模型? 沒有啊 image

qinxuye commented 3 months ago

加载模型超时,能重启重试下,把日志贴一下。

vball commented 3 months ago

加载模型超时,能重启重试下,把日志贴一下。 報下面的錯誤, @qinxuye

2024-06-06 09:29:00,501 xinference.model.llm.llm_family 25577 INFO Caching from Hugging Face: baichuan-inc/Baichuan2-7B-Base pytorch_model-00001-of-00002.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.93G/9.93G [03:32<00:00, 13.9MB/s] Fetching 18 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [03:35<00:00, 11.95s/it] /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:619: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") 2024-06-06 09:32:37,011 xinference.core.worker 25577 ERROR Failed to load model baichuan-2-1-0 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 664, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive__ raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.__on_receive result = func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 239, in load self._model.load() File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/pytorch/core.py", line 251, in load self._model, self._tokenizer = self._load_model(kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/pytorch/core.py", line 127, in _load_model tokenizer = AutoTokenizer.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 865, in from_pretrained return tokenizer_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2110, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py

vball commented 3 months ago

@qinxuye 在嗎?上面的異常信息可能是什麽問題?一直沒法正常把模型跑起來

jony4 commented 2 months ago

https://github.com/xorbitsai/inference/issues/888#issuecomment-2198456521

likenamehaojie commented 2 months ago

我也遇到了相同的问题,工在部署chattts时出现 了这个问题,同时期我有一个正在运行的wisper模型

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.