Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Loading checkpoint shards: 100%|███████████████| 17/17 [00:23<00:00, 1.38s/it]
2024-10-25 15:28:56,618 xinference.core.worker 38651 INFO [request 4db91eb8-9307-11ef-ad67-80615f20f615] Leave launch_builtin_model, elapsed time: 31 s
2024-10-25 15:29:44,187 transformers.models.qwen2.modeling_qwen2 63636 WARNING Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
2024-10-25 15:29:44,197 xinference.model.llm.transformers.utils 63636 ERROR Internal error for batch inference: probability tensor contains either `inf`, `nan` or element < 0.
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 483, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 286, in _batch_inference_one_step_internal
token = _get_token_from_logits(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 111, in _get_token_from_logits
indices = torch.multinomial(probs, num_samples=2)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
2024-10-25 15:29:44,281 xinference.api.restful_api 37502 ERROR Chat completion stream got an error: [address=172.22.149.188:44411, pid=63636] probability tensor contains either `inf`, `nan` or element < 0
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1926, in stream_results
async for item in iterator:
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 340, in __anext__
return await self._actor_ref.__xoscar_next__(self._uid)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 431, in __xoscar_next__
raise e
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 419, in __xoscar_next__
r = await asyncio.create_task(_async_wrapper(gen))
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 409, in _async_wrapper
return await _gen.__anext__() # noqa: F821
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 440, in _to_async_gen
async for v in gen:
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 568, in _queue_consumer
raise RuntimeError(res[len(XINFERENCE_STREAMING_ERROR_FLAG) :])
RuntimeError: [address=172.22.149.188:44411, pid=63636] probability tensor contains either `inf`, `nan` or element < 0
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/blocks.py", line 1786, in process_api
result = await self.call_function(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/blocks.py", line 1350, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.__anext__()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/utils.py", line 709, in asyncgen_wrapper
response = await iterator.__anext__()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/chat_interface.py", line 545, in _stream_fn
first_response = await async_iteration(generator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/utils.py", line 583, in async_iteration
return await iterator.__anext__()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/utils.py", line 576, in __anext__
return await anyio.to_thread.run_sync(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
return next(iterator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 122, in generate_wrapper
for chunk in model.chat(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/client/common.py", line 51, in streaming_response_iterator
raise Exception(str(error))
Exception: [address=172.22.149.188:44411, pid=63636] probability tensor contains either `inf`, `nan` or element < 0
使用 vLLm 启动 qwen2.5-32b-instruct 超时报错
截图
错误日志
INFO 10-25 12:11:07 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='/home/models/Qwen25-32B-Instruct', speculative_config=None, tokenizer='/home/models/Qwen25-32B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/home/models/Qwen25-32B-Instruct)
INFO 10-25 12:11:12 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:12 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 10-25 12:11:13 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance.
INFO 10-25 12:11:13 selector.py:32] Using XFormers backend.
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:13 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance.
(RayWorkerWrapper pid=49552) INFO 10-25 12:11:13 selector.py:32] Using XFormers backend.
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] Traceback (most recent call last):
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] return executor(*args, **kwargs)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 104, in init_device
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] _check_if_gpu_supports_dtype(self.model_config.dtype)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 321, in _check_if_gpu_supports_dtype
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] compute_capability = torch.cuda.get_device_capability()
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] prop = get_device_properties(device)
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/cuda/__init__.py", line 447, in get_device_properties
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] raise AssertionError("Invalid device id")
(RayWorkerWrapper pid=49552) ERROR 10-25 12:11:13 worker_base.py:145] AssertionError: Invalid device id
ERROR 10-25 12:21:14 worker_base.py:145] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 10-25 12:21:14 worker_base.py:145] Traceback (most recent call last):
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
ERROR 10-25 12:21:14 worker_base.py:145] return executor(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
ERROR 10-25 12:21:14 worker_base.py:145] init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
ERROR 10-25 12:21:14 worker_base.py:145] init_distributed_environment(parallel_config.world_size, rank,
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
ERROR 10-25 12:21:14 worker_base.py:145] torch.distributed.init_process_group(
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
ERROR 10-25 12:21:14 worker_base.py:145] return func(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
ERROR 10-25 12:21:14 worker_base.py:145] func_return = func(*args, **kwargs)
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
ERROR 10-25 12:21:14 worker_base.py:145] store, rank, world_size = next(rendezvous_iterator)
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
ERROR 10-25 12:21:14 worker_base.py:145] store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
ERROR 10-25 12:21:14 worker_base.py:145] File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
ERROR 10-25 12:21:14 worker_base.py:145] return TCPStore(
ERROR 10-25 12:21:14 worker_base.py:145] torch.distributed.DistStoreError: Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,825 xinference.core.worker 38651 ERROR Failed to load model custom-qwen25-32b-instruct-1-0
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
await model_ref.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
self._model.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
self._engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
engine = cls(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
return engine_class(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
self.model_executor = executor_class(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
super().__init__(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
self._init_workers_ray(placement_group)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
self._run_workers("init_device")
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
driver_worker_output = self.driver_worker.execute_method(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
raise e
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
return executor(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
init_worker_distributed_environment(self.parallel_config, self.rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
init_distributed_environment(parallel_config.world_size, rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
torch.distributed.init_process_group(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,970 xinference.core.worker 38651 ERROR [request b9f07de0-92eb-11ef-ad67-80615f20f615] Leave launch_builtin_model, error: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined., elapsed time: 614 s
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
await model_ref.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
self._model.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
self._engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
engine = cls(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
return engine_class(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
self.model_executor = executor_class(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
super().__init__(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
self._init_workers_ray(placement_group)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
self._run_workers("init_device")
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
driver_worker_output = self.driver_worker.execute_method(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
raise e
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
return executor(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
init_worker_distributed_environment(self.parallel_config, self.rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
init_distributed_environment(parallel_config.world_size, rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
torch.distributed.init_process_group(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:21:14,974 xinference.api.restful_api 37502 ERROR [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 977, in launch_model
model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1040, in launch_builtin_model
await _launch_model()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1004, in _launch_model
await _launch_one_model(rep_model_uid)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 983, in _launch_one_model
await worker_ref.launch_builtin_model(
File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
async with lock:
File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/worker.py", line 894, in launch_builtin_model
await model_ref.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in load
self._model.load()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 261, in load
self._engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
engine = cls(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
return engine_class(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
self.model_executor = executor_class(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 300, in __init__
super().__init__(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 43, in _init_executor
self._init_workers_ray(placement_group)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 164, in _init_workers_ray
self._run_workers("init_device")
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 234, in _run_workers
driver_worker_output = self.driver_worker.execute_method(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 146, in execute_method
raise e
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 137, in execute_method
return executor(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 111, in init_device
init_worker_distributed_environment(self.parallel_config, self.rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/worker/worker.py", line 288, in init_worker_distributed_environment
init_distributed_environment(parallel_config.world_size, rank,
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 70, in init_distributed_environment
torch.distributed.init_process_group(
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 89, in wrapper
func_return = func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1305, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 199, in _tcp_rendezvous_handler
store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout, use_libuv)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 174, in _create_c10d_store
return TCPStore(
torch.distributed.DistStoreError: [address=172.22.149.188:45585, pid=43525] Timed out after 601 seconds waiting for clients. 1/2 clients joined.
2024-10-25 12:22:12,510 xinference.core.supervisor 38651 ERROR [request 4a696e76-92ed-11ef-ad67-80615f20f615] Leave get_model, error: Model not found in the model list, uid: custom-qwen2-vl-7b-instruct, elapsed time: 0 s
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1137, in get_model
raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: Model not found in the model list, uid: custom-qwen2-vl-7b-instruct
2024-10-25 12:22:12,512 xinference.api.restful_api 37502 ERROR [address=172.22.149.188:61160, pid=38651] Model not found in the model list, uid: custom-qwen2-vl-7b-instruct
Traceback (most recent call last):
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1856, in create_chat_completion
model = await (await self._get_supervisor_ref()).get_model(model_uid)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/supervisor.py", line 1137, in get_model
raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: [address=172.22.149.188:61160, pid=38651] Model not found in the model list, uid: custom-qwen2-vl-7b-instruct
System Info / 系統信息
Python Version: Python 3.10.6
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
The command used to start Xinference / 用以启动 xinference 的命令
Reproduction / 复现过程
Expected behavior / 期待表现
解决反馈问题