安装完成后xinference-local --host 0.0.0.0 --port 9997运行报错

pan-common commented 1 month ago

System Info / 系統信息

ubuntu20.0.4 NVIDIA-SMI 535.104.05
Driver Version: 535.104.05 CUDA Version: 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

Name: xinference Version: 0.13.0 Summary: Model Serving Made Easy Home-page: https://github.com/xorbitsai/inference Author: Qin Xuye Author-email: qinxuye@xprobe.io License: Apache License 2.0 Location: /root/anaconda3/envs/py311/lib/python3.11/site-packages Requires: aioprometheus, async-timeout, click, fastapi, fsspec, gradio, huggingface-hub, modelscope, openai, opencv-contrib-python, passlib, peft, pillow, pydantic, pynvml, python-jose, requests, s3fs, sse-starlette, tabulate, timm, torch, tqdm, typer, typing-extensions, uvicorn, xoscar Required-by:

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

(py311) root@b721c068038e:/opt/xinference# xinference-local --host 0.0.0.0 --port 9997 2024-07-10 12:28:08,395 xinference.core.supervisor 83095 INFO Xinference supervisor 0.0.0.0:44062 started 2024-07-10 12:28:08,425 xinference.core.worker 83095 INFO Starting metrics export server at 0.0.0.0:None 2024-07-10 12:28:08,431 xinference.core.worker 83095 INFO Checking metrics export server... 2024-07-10 12:28:09,600 xinference.core.worker 83095 INFO Metrics server is started at: http://0.0.0.0:41815 2024-07-10 12:28:09,601 xinference.core.worker 83095 INFO Xinference worker 0.0.0.0:44062 started 2024-07-10 12:28:09,602 xinference.core.worker 83095 INFO Purge cache directory: /root/.xinference/cache 2024-07-10 12:28:11,604 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status async with timeout(2): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit self._do_exit(exc_type) File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit raise asyncio.TimeoutError TimeoutError 2024-07-10 12:28:14,296 xinference.api.restful_api 82961 INFO Starting Xinference at endpoint: http://0.0.0.0:9997 2024-07-10 12:28:14,648 uvicorn.error 82961 INFO Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit) 2024-07-10 12:28:18,618 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 799, in report_status async with timeout(2): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 141, in aexit self._do_exit(exc_type) File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/async_timeout/init.py", line 228, in _do_exit raise asyncio.TimeoutError TimeoutError 2024-07-10 12:28:25,628 xinference.core.worker 83095 ERROR Report status got error. Traceback (most recent call last): File "/root/anaconda3/envs/py311/lib/python3.11/site-packages/xinference/core/worker.py", line 800, in report_status status = await asyncio.to_thread(gather_node_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/py311/lib/python3.11/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Expected behavior / 期待表现

可以正常使用gpu显卡运行

ChengjieLi28 commented 1 month ago

@pan-common worker向supervisor汇报状态时出错。先尝试打开debug日志（另外你的错误没给全，请把完整的全贴上来，During handling of the above exception, another exception occurred:这句后面的都贴出来），看看有没有具体错误。然后这样可以绕过汇报流程，看看能不能启动

XINFERENCE_DISABLE_HEALTH_CHECK=1 xinference-local --host 0.0.0.0 --port 9997

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open for 7 days with no activity.

xorbitsai / inference