xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
4.65k stars 364 forks source link

安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错 #1835

Open pan-common opened 1 month ago

pan-common commented 1 month ago

System Info / 系統信息

ubuntu20.0.4 NVIDIA-SMI 535.104.05
Driver Version: 535.104.05 CUDA Version: 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

Name: xinference Version: 0.13.0 Summary: Model Serving Made Easy Home-page: https://github.com/xorbitsai/inference Author: Qin Xuye Author-email: qinxuye@xprobe.io License: Apache License 2.0 Location: /root/anaconda3/envs/py311/lib/python3.11/site-packages Requires: aioprometheus, async-timeout, click, fastapi, fsspec, gradio, huggingface-hub, modelscope, openai, opencv-contrib-python, passlib, peft, pillow, pydantic, pynvml, python-jose, requests, s3fs, sse-starlette, tabulate, timm, torch, tqdm, typer, typing-extensions, uvicorn, xoscar Required-by:

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

(py311) root@b721c068038e:/opt/xinference# xinference-local --host 0.0.0.0 --port 9997 2024-07-10 12:28:08,395 xinference.core.supervisor 83095 INFO Xinference supervisor 0.0.0.0:44062 started 2024-07-10 12:28:08,425 xinference.core.worker 83095 INFO Starting metrics export server at 0.0.0.0:None 2024-07-10 12:28:08,431 xinference.core.worker 83095 INFO Checking metrics export server... 2024-07-10 12:28:09,600 xinference.core.worker 83095 INFO Metrics server is started at: http://0.0.0.0:41815 2024-07-10 12:28:09,601 xinference.core.worker 83095 INFO Xinference worker 0.0.0.0:44062 started 2024-07-10 12:28:09,602 xinference.core.worker 83095 INFO Purge cache directory: /root/.xinference/cache 2024-07-10 12:28:11,604 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status async with timeout(2): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit self._do_exit(exc_type) File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit raise asyncio.TimeoutError TimeoutError 2024-07-10 12:28:14,296 xinference.api.restful_api 82961 INFO Starting Xinference at endpoint: http://0.0.0.0:9997 2024-07-10 12:28:14,648 uvicorn.error 82961 INFO Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit) 2024-07-10 12:28:18,618 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status async with timeout(2): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit self._do_exit(exc_type) File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit raise asyncio.TimeoutError TimeoutError 2024-07-10 12:28:25,628 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Expected behavior / 期待表现

可以正常使用gpu显卡运行

ChengjieLi28 commented 1 month ago

@pan-common worker向supervisor汇报状态时出错。 先尝试打开debug日志(另外你的错误没给全,请把完整的全贴上来,During handling of the above exception, another exception occurred:这句后面的都贴出来),看看有没有具体错误。 然后这样可以绕过汇报流程,看看能不能启动

XINFERENCE_DISABLE_HEALTH_CHECK=1 xinference-local --host 0.0.0.0 --port 9997
github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open for 7 days with no activity.