xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.2k stars 421 forks source link

Model Engine 使用 vLLm 和 Transformers 启动 qwen2.5-32b-instruct 均出错 #2486

Open andylzming opened 2 days ago

andylzming commented 2 days ago

System Info / 系統信息

Python Version: Python 3.10.6

[root@gpu-server ~]# nvidia-smi
Fri Oct 25 15:24:01 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A800 80G...  Off  | 00000000:4B:00.0 Off |                    0 |
| N/A   75C    P0   118W / 300W |   3840MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A800 80G...  Off  | 00000000:B1:00.0 Off |                    0 |
| N/A   45C    P0    48W / 300W |      2MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11474      C   java                             3838MiB |
+-----------------------------------------------------------------------------+

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

Version info / 版本信息

(xinference) [root@gpu-server ~]# pip list
Package                           Version
--------------------------------- ------------
accelerate                        0.29.1
addict                            2.4.0
aiobotocore                       2.7.0
aiofiles                          23.2.1
aiohttp                           3.9.1
aioitertools                      0.11.0
aioprometheus                     23.12.0
aiosignal                         1.3.1
aliyun-python-sdk-core            2.14.0
aliyun-python-sdk-kms             2.16.2
altair                            5.2.0
annotated-types                   0.7.0
anyio                             3.7.1
asttokens                         2.4.1
async-timeout                     4.0.3
attrs                             23.1.0
auto-gptq                         0.6.0
av                                12.3.0
bcrypt                            4.1.2
bitsandbytes                      0.41.3.post2
botocore                          1.31.64
certifi                           2023.11.17
cffi                              1.16.0
charset-normalizer                3.3.2
chatglm-cpp                       0.3.0
click                             8.1.7
cloudpickle                       3.0.0
cmake                             3.28.1
colorama                          0.4.6
coloredlogs                       15.0.1
comm                              0.2.0
contourpy                         1.2.0
crcmod                            1.7
cryptography                      41.0.7
ctransformers                     0.2.27
cycler                            0.12.1
datasets                          2.15.0
debugpy                           1.8.0
decorator                         5.1.1
dill                              0.3.7
diskcache                         5.6.3
distro                            1.8.0
ecdsa                             0.18.0
einops                            0.7.0
exceptiongroup                    1.2.0
executing                         2.0.1
fastapi                           0.110.3
ffmpy                             0.3.1
filelock                          3.13.1
fonttools                         4.47.0
frozenlist                        1.4.1
fsspec                            2023.10.0
gast                              0.5.4
gekko                             1.0.6
gradio                            4.26.0
gradio_client                     0.15.1
h11                               0.14.0
httpcore                          1.0.2
httptools                         0.6.1
httpx                             0.25.2
huggingface-hub                   0.24.6
humanfriendly                     10.0
idna                              3.6
importlib-metadata                7.0.0
importlib-resources               6.1.1
interegular                       0.3.3
ipykernel                         6.26.0
ipython                           8.17.2
jedi                              0.19.1
Jinja2                            3.1.2
jiter                             0.6.1
jmespath                          0.10.0
joblib                            1.3.2
jsonschema                        4.20.0
jsonschema-specifications         2023.11.2
jupyter_client                    8.6.0
jupyter_core                      5.5.0
kiwisolver                        1.4.5
lark                              1.1.9
linkify-it-py                     2.0.2
llama_cpp_python                  0.2.25
llvmlite                          0.42.0
lm-format-enforcer                0.9.8
markdown-it-py                    2.2.0
MarkupSafe                        2.1.3
matplotlib                        3.8.2
matplotlib-inline                 0.1.6
mdit-py-plugins                   0.3.3
mdurl                             0.1.2
modelscope                        1.10.0
mpmath                            1.3.0
msgpack                           1.0.7
multidict                         6.0.4
multiprocess                      0.70.15
nest-asyncio                      1.5.8
networkx                          3.2.1
ninja                             1.11.1.1
nltk                              3.8.1
numba                             0.59.1
numpy                             1.26.2
nvidia-cublas-cu12                12.1.3.1
nvidia-cuda-cupti-cu12            12.1.105
nvidia-cuda-nvrtc-cu12            12.1.105
nvidia-cuda-runtime-cu12          12.1.105
nvidia-cudnn-cu12                 8.9.2.26
nvidia-cufft-cu12                 11.0.2.54
nvidia-curand-cu12                10.3.2.106
nvidia-cusolver-cu12              11.4.5.107
nvidia-cusparse-cu12              12.1.0.106
nvidia-ml-py                      12.550.52
nvidia-nccl-cu12                  2.20.5
nvidia-nvjitlink-cu12             12.3.101
nvidia-nvtx-cu12                  12.1.105
openai                            1.50.1
opencv-contrib-python             4.10.0.82
opencv-python                     4.9.0.80
optimum                           1.16.1
orjson                            3.9.10
oss2                              2.18.3
outlines                          0.0.34
packaging                         23.2
pandas                            2.1.4
parso                             0.8.3
passlib                           1.7.4
peft                              0.7.1
pexpect                           4.8.0
Pillow                            10.1.0
pip                               23.3
platformdirs                      4.1.0
prometheus_client                 0.20.0
prometheus-fastapi-instrumentator 7.0.0
prompt-toolkit                    3.0.40
protobuf                          4.25.1
psutil                            5.9.7
ptyprocess                        0.7.0
pure-eval                         0.2.2
py-cpuinfo                        9.0.0
pyarrow                           14.0.2
pyarrow-hotfix                    0.6
pyasn1                            0.5.1
pycparser                         2.21
pycryptodome                      3.19.0
pydantic                          2.6.4
pydantic_core                     2.16.3
pydub                             0.25.1
Pygments                          2.16.1
pynvml                            11.5.0
pyparsing                         3.1.1
python-dateutil                   2.8.2
python-dotenv                     1.0.0
python-jose                       3.3.0
python-multipart                  0.0.9
pytz                              2023.3.post1
PyYAML                            6.0.1
pyzmq                             25.1.1
quantile-python                   1.1
qwen-vl-utils                     0.0.8
ray                               2.9.3
referencing                       0.32.0
regex                             2023.10.3
requests                          2.31.0
rich                              13.7.1
rouge                             1.0.1
rpds-py                           0.15.2
rsa                               4.9
ruff                              0.4.6
s3fs                              2023.10.0
safetensors                       0.4.1
scikit-learn                      1.3.2
scipy                             1.11.4
semantic-version                  2.10.0
sentence-transformers             2.7.0
sentencepiece                     0.1.99
setuptools                        69.0.2
shellingham                       1.5.4
simplejson                        3.19.2
six                               1.16.0
sniffio                           1.3.0
sortedcontainers                  2.4.0
sse-starlette                     1.8.2
stack-data                        0.6.3
starlette                         0.37.2
sympy                             1.12
tabulate                          0.9.0
tblib                             3.0.0
threadpoolctl                     3.2.0
tiktoken                          0.6.0
timm                              0.9.16
tokenizers                        0.20.1
tomli                             2.0.1
tomlkit                           0.12.0
toolz                             0.12.0
torch                             2.3.0+cu121
torchaudio                        2.3.0+cu121
torchvision                       0.18.0+cu121
tornado                           6.3.3
tqdm                              4.66.1
traitlets                         5.13.0
transformers                      4.45.1
transformers-stream-generator     0.0.4
triton                            2.3.0
typer                             0.11.1
typing_extensions                 4.12.2
tzdata                            2023.3
uc-micro-py                       1.0.2
urllib3                           2.0.7
uvicorn                           0.24.0.post1
uvloop                            0.19.0
vllm                              0.4.2
vllm-nccl-cu12                    2.18.1.0.4.0
watchfiles                        0.21.0
wcwidth                           0.2.9
websockets                        11.0.3
wheel                             0.41.2
wrapt                             1.16.0
xformers                          0.0.26.post1
xinference                        0.16.0
xinference-client                 0.16.0
xoscar                            0.3.0
xxhash                            3.4.1
yapf                              0.40.2
yarl                              1.9.4
zipp                              3.17.0

The command used to start Xinference / 用以启动 xinference 的命令

nohup xinference-local -H 172.22.149.188 -p 59997 &

Reproduction / 复现过程

  1. 使用 Transformers 启动 qwen2.5-32b-instruct 正常,对话时报错
    • 截图

2

  1. 使用 vLLm 启动 qwen2.5-32b-instruct 超时报错

1

INFO 10-25 12:11:07 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='/home/models/Qwen25-32B-Instruct', speculative_config=None, tokenizer='/home/models/Qwen25-32B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/home/models/Qwen25-32B-Instruct)
INFO 10-25 12:11:12 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:12 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 10-25 12:11:13 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance.
INFO 10-25 12:11:13 selector.py:32] Using XFormers backend.
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:13 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance.
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:13 selector.py:32] Using XFormers backend.
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] Traceback (most recent call last):
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 104, in init_device
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]     _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]     compute_capability = torch.cuda.get_device_capability()
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]     prop = get_device_properties(device)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/__init__.py", line 447, in get_device_properties
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145]     raise AssertionError("Invalid device id")
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] AssertionError: Invalid device id
ERROR 10-25 12:21:14 worker_base.py:145] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 10-25 12:21:14 worker_base.py:145] Traceback (most recent call last):
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
ERROR 10-25 12:21:14 worker_base.py:145]     return executor(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
ERROR 10-25 12:21:14 worker_base.py:145]     init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
ERROR 10-25 12:21:14 worker_base.py:145]     init_distributed_environment(parallel_config.world_size, rank,
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
ERROR 10-25 12:21:14 worker_base.py:145]     torch.distributed.init_process_group(
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
ERROR 10-25 12:21:14 worker_base.py:145]     return func(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
ERROR 10-25 12:21:14 worker_base.py:145]     func_return = func(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
ERROR 10-25 12:21:14 worker_base.py:145]     store, rank, world_size = next(rendezvous_iterator)
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
ERROR 10-25 12:21:14 worker_base.py:145]     store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
ERROR 10-25 12:21:14 worker_base.py:145]   File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
ERROR 10-25 12:21:14 worker_base.py:145]     return TCPStore(
ERROR 10-25 12:21:14 worker_base.py:145] torch.distributed.DistStoreError: Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,825 xinference.core.worker 38651 ERROR    Failed to load model custom-qwen25-32b-instruct-1-0
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
    await model_ref.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
    self._model.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
    self._engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
    engine = cls(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
    self.model_executor = executor_class(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
    super().__init__(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
    self._init_workers_ray(placement_group)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
    self._run_workers("init_device")
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
    raise e
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
    return executor(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
    init_worker_distributed_environment(self.parallel_config, self.rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
    init_distributed_environment(parallel_config.world_size, rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
    torch.distributed.init_process_group(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
    func_return = func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
    store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
    return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,970 xinference.core.worker 38651 ERROR    [request b9f07de0-92eb-11ef-ad67-80615f20f615] Leave launch_builtin_model, error: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined., elapsed time: 614 s
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
    await model_ref.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
    self._model.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
    self._engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
    engine = cls(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
    self.model_executor = executor_class(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
    super().__init__(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
    self._init_workers_ray(placement_group)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
    self._run_workers("init_device")
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
    raise e
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
    return executor(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
    init_worker_distributed_environment(self.parallel_config, self.rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
    init_distributed_environment(parallel_config.world_size, rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
    torch.distributed.init_process_group(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
    func_return = func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
    store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
    return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,974 xinference.api.restful_api 37502 ERROR    [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 977, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1040, in launch_builtin_model
    await _launch_model()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1004, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 983, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
    await model_ref.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
    self._model.load()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
    self._engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
    engine = cls(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
    self.model_executor = executor_class(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
    super().__init__(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
    self._init_executor()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
    self._init_workers_ray(placement_group)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
    self._run_workers("init_device")
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
    driver_worker_output = self.driver_worker.execute_method(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
    raise e
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
    return executor(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
    init_worker_distributed_environment(self.parallel_config, self.rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
    init_distributed_environment(parallel_config.world_size, rank,
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
    torch.distributed.init_process_group(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
    func_return = func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
    store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
    return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:22:12,510 xinference.core.supervisor 38651 ERROR    [request 4a696e76-92ed-11ef-ad67-80615f20f615] Leave get_model, error: Model not found in the model list, uid: custom-qwen2-vl-7b-instruct, elapsed time: 0 s
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1137, in get_model
    raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: Model not found in the model list, uid: custom-qwen2-vl-7b-instruct
2024-10-25 12:22:12,512 xinference.api.restful_api 37502 ERROR    [address=172.22.149.188:61160, pid=38651] Model not found in the model list, uid: custom-qwen2-vl-7b-instruct
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1856, in create_chat_completion
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1137, in get_model
    raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: [address=172.22.149.188:61160, pid=38651] Model not found in the model list, uid: custom-qwen2-vl-7b-instruct

Expected behavior / 期待表现

解决反馈问题

andylzming commented 2 days ago