Open oldmikeyang opened 3 months ago
Hi, I am currently investigating this issue. Will update to this issue once I fix it
Hi, this should have been fixed by PR: https://github.com/intel-analytics/ipex-llm/pull/11817
You can upgrade ipex-llm tomorrow and see if this works.
with latest IPEX-LLM, the following error during inference
INFO 08-16 10:12:59 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:12:59 async_llm_engine.py:494] Received request cmpl-a50bf7e6bc264357815b2c77018ec28e-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: [23729, 12879, 374, 264], lora_request: None.
INFO 08-16 10:13:09 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:13:19 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
ERROR 08-16 10:13:19 async_llm_engine.py:41] Engine background task failed
ERROR 08-16 10:13:19 async_llm_engine.py:41] Traceback (most recent call last):
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
ERROR 08-16 10:13:19 async_llm_engine.py:41] task.result()
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
ERROR 08-16 10:13:19 async_llm_engine.py:41] has_requests_in_progress = await asyncio.wait_for(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
ERROR 08-16 10:13:19 async_llm_engine.py:41] return fut.result()
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
ERROR 08-16 10:13:19 async_llm_engine.py:41] request_outputs = await self.engine.step_async()
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] output = await self.model_executor.execute_model_async(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] all_outputs = await self._run_workers_async(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] all_outputs = await asyncio.gather(coros)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
ERROR 08-16 10:13:19 async_llm_engine.py:41] return (yield from awaitable.await())
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
ERROR 08-16 10:13:19 async_llm_engine.py:41] return executor(args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 08-16 10:13:19 async_llm_engine.py:41] return func(*args, *kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
ERROR 08-16 10:13:19 async_llm_engine.py:41] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 08-16 10:13:19 async_llm_engine.py:41] return func(args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = model_executable(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(*args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(*args, *kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(*args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = self.embed_tokens(input_ids)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(*args, *kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(args, kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] output_parallel[input_mask, :] = 0.0
ERROR 08-16 10:13:19 async_llm_engine.py:41] ~~~^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] RuntimeError: Allocation is out of device memory on current platform.
2024-08-16 10:13:19,835 - ERROR - Exception in callback functools.partial(<function _raise_exception_on_finish at 0x701317444040>, error_callback=<bound method AsyncLLMEngine._error_callback of <ipex_llm.vllm.xpu.engine.engine.IPEXLLMAsyncLLMEngine object at 0x7013133c7310>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x701317444040>, error_callback=<bound method AsyncLLMEngine._error_callback of <ipex_llm.vllm.xpu.engine.engine.IPEXLLMAsyncLLMEngine object at 0x7013133c7310>>)>
Traceback (most recent call last):
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
task.result()
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
return fut.result()
^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
request_outputs = await self.engine.step_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
all_outputs = await self._run_workers_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
all_outputs = await asyncio.gather(coros)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
return (yield from awaitable.await())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
return executor(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
hidden_states = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
output_parallel[input_mask, :] = 0.0
RuntimeError: Allocation is out of device memory on current platform.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 43, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 08-16 10:13:19 async_llm_engine.py:152] Aborted request cmpl-a50bf7e6bc264357815b2c77018ec28e-0.
INFO: 127.0.0.1:44858 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 754, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 774, in app
await route.handle(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 295, in handle
await self.app(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 74, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 213, in create_completion
generator = await openai_serving_completion.create_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 179, in create_completion
async for i, res in result_generator:
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 82, in consumer
raise item
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 67, in producer
async for item in iterator:
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 625, in generate
raise e
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 619, in generate
async for request_output in stream:
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 75, in __anext__
raise result
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
task.result()
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
return fut.result()
^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
request_outputs = await self.engine.step_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
all_outputs = await self._run_workers_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
return (yield from awaitable.__await__())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
return executor(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
hidden_states = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
output_parallel[input_mask, :] = 0.0
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
RuntimeError: Allocation is out of device memory on current platform.
INFO 08-16 10:13:29 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:13:39 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
2024:08:16-10:13:39:(196608) |CCL_ERROR| worker.cpp:353 ccl_worker_func: worker 6 caught internal exception: oneCCL: ze_call.cpp:43 do_call: EXCEPTION: ze error at zeCommandQueueExecuteCommandLists, code: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY
[2024-08-16 10:13:39,930 E 191503 196608] logging.cc:108: Unhandled exception: N3ccl2v19exceptionE. what(): oneCCL: ze_call.cpp:43 do_call: EXCEPTION: ze error at zeCommandQueueExecuteCommandLists, code: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:115: Stack trace:
/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ray/_raylet.so(+0x10b7bea) [0x7013082b7bea] ray::operator<<()
/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ray/_raylet.so(+0x10bae72) [0x7013082bae72] ray::TerminateHandler()
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x70128c4ae20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x70128c4ae277]
/opt/intel/1ccl-wks/lib/libccl.so.1(+0x4c26e9) [0x6fe1a54c26e9] ccl_worker_func()
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x70131d494ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x70131d526850]
*** SIGABRT received at time=1723774419 on cpu 41 ***
PC: @ 0x70131d4969fc (unknown) pthread_kill
@ 0x70131d442520 (unknown) (unknown)
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:440: *** SIGABRT received at time=1723774419 on cpu 41 ***
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:440: PC: @ 0x70131d4969fc (unknown) pthread_kill
[2024-08-16 10:13:39,939 E 191503 196608] logging.cc:440: @ 0x70131d442520 (unknown) (unknown)
Fatal Python error: Aborted
Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, sentencepiece._sentencepiece, PIL._imaging, PIL._imagingft, markupsafe._speedups, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow.lib, pyarrow._json, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups (total: 49)
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Silver 4410Y]
Registry and code: 13 MB
Command: python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name Qwen2-72B-Instruct --port 8000 --model /home/llm/local_models/Qwen/Qwen2-72B-Instruct --trust-remote-code --gpu-memory-utilization 0.90 --device xpu --dtype float16 --enforce-eager --load-in-low-bit fp8 --max-model-len 6656 --max-num-batched-tokens 6656 --tensor-parallel-size 8
Uptime: 3880.324215 s
start_vllm_arc.sh: line 28: 191503 Aborted (core dumped) python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name $served_model_name --port 8000 --model $model --trust-remote-code --gpu-memory-utilization 0.90 --device xpu --dtype float16 --enforce-eager --load-in-low-bit fp8 --max-model-len 6656 --max-num-batched-tokens 6656 --tensor-parallel-size 8
(ipex-llm-0816) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0816/python/llm/scripts$ bash env-check.sh
-----------------------------------------------------------------
PYTHON_VERSION=3.11.9
-----------------------------------------------------------------
Transformers is not installed.
-----------------------------------------------------------------
PyTorch is not installed.
-----------------------------------------------------------------
ipex-llm Version: 2.1.0b20240815
-----------------------------------------------------------------
IPEX is not installed.
-----------------------------------------------------------------
CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Silver 4410Y
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
Stepping: 8
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
-----------------------------------------------------------------
Total CPU Memory: 755.542 GB
-----------------------------------------------------------------
Operating System:
Ubuntu 22.04.4 LTS \n \l
-----------------------------------------------------------------
Linux GPU-Xeon4410Y-ARC770 6.5.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 15 16:40:02 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
CLI:
Version: 1.2.27.20240626
Build ID: 7f002d24
Service:
Version: 1.2.27.20240626
Build ID: 7f002d24
Level Zero Version: 1.16.0
-----------------------------------------------------------------
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
-----------------------------------------------------------------
Driver related package version:
ii intel-fw-gpu 2024.17.5-329~22.04 all Firmware package for Intel integrated and discrete GPUs
ii intel-i915-dkms 1.24.3.23.240419.26+i30-1 all Out of tree i915 driver.
ii intel-level-zero-gpu 1.3.29138.7 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii level-zero-dev 1.16.15-881~22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
-----------------------------------------------------------------
env-check.sh: line 167: sycl-ls: command not found
igpu not detected
-----------------------------------------------------------------
xpu-smi is properly installed.
-----------------------------------------------------------------
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0019-0000-000856a08086 |
| | PCI BDF Address: 0000:19:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 1 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-002c-0000-000856a08086 |
| | PCI BDF Address: 0000:2c:00.0 |
| | DRM Device: /dev/dri/card2 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 2 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0052-0000-000856a08086 |
| | PCI BDF Address: 0000:52:00.0 |
| | DRM Device: /dev/dri/card3 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 3 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0065-0000-000856a08086 |
| | PCI BDF Address: 0000:65:00.0 |
| | DRM Device: /dev/dri/card4 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 4 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-009b-0000-000856a08086 |
| | PCI BDF Address: 0000:9b:00.0 |
| | DRM Device: /dev/dri/card5 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 5 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00ad-0000-000856a08086 |
| | PCI BDF Address: 0000:ad:00.0 |
| | DRM Device: /dev/dri/card6 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 6 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00d1-0000-000856a08086 |
| | PCI BDF Address: 0000:d1:00.0 |
| | DRM Device: /dev/dri/card7 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 7 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00e3-0000-000856a08086 |
| | PCI BDF Address: 0000:e3:00.0 |
| | DRM Device: /dev/dri/card8 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
GPU0 Memory size=16M
GPU1 Memory size=16G
GPU2 Memory size=16G
GPU3 Memory size=16G
GPU4 Memory size=16G
GPU5 Memory size=16G
GPU6 Memory size=16G
GPU7 Memory size=16G
GPU8 Memory size=16G
-----------------------------------------------------------------
03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller])
DeviceName: Onboard VGA
Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
Flags: medium devsel, IRQ 16, NUMA node 0
Memory at 94000000 (32-bit, non-prefetchable) [size=16M]
Memory at 95000000 (32-bit, non-prefetchable) [size=256K]
I/O ports at 2000 [size=128]
Capabilities: <access denied>
Kernel driver in use: ast
Kernel modules: ast
--
19:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 130, NUMA node 0
Memory at 9e000000 (64-bit, non-prefetchable) [size=16M]
Memory at 5f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at 9f000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
2c:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 133, NUMA node 0
Memory at a8000000 (64-bit, non-prefetchable) [size=16M]
Memory at 6f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at a9000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
52:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 136, NUMA node 0
Memory at bc000000 (64-bit, non-prefetchable) [size=16M]
Memory at 8f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at bd000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
65:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 139, NUMA node 0
Memory at c6000000 (64-bit, non-prefetchable) [size=16M]
Memory at 9f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at c7000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
9b:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 142, NUMA node 1
Memory at d8000000 (64-bit, non-prefetchable) [size=16M]
Memory at cf800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at d9000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
ad:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 145, NUMA node 1
Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
Memory at df800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at e1000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
d1:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 148, NUMA node 1
Memory at f1000000 (64-bit, non-prefetchable) [size=16M]
Memory at ff800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at f2000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
--
e3:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1020
Flags: bus master, fast devsel, latency 0, IRQ 151, NUMA node 1
Memory at f9000000 (64-bit, non-prefetchable) [size=16M]
Memory at 10f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at fa000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915
-----------------------------------------------------------------
Hi, this problem is due to out of memory. You can reduce gpu-utilization-rate, or reduce max-num-batched-tokens.
Use this command will fix the problem:
#!/bin/bash
model="/home/llm/local_models/Qwen/Qwen2-72B-Instruct"
served_model_name="Qwen2-72B-Instruct"
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export TORCH_LLM_ALLREDUCE=0
export CCL_DG2_ALLREDUCE=1
# Tensor parallel related arguments:
export CCL_WORKER_COUNT=2
export FI_PROVIDER=shm
export CCL_ATL_TRANSPORT=ofi
export CCL_ZE_IPC_EXCHANGE=sockets
export CCL_ATL_SHM=1
source /opt/intel/1ccl-wks/setvars.sh
python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
--served-model-name $served_model_name \
--port 8000 \
--model $model \
--trust-remote-code \
--gpu-memory-utilization 0.85 \
--device xpu \
--dtype float16 \
--enforce-eager \
--load-in-low-bit fp8 \
--max-model-len 4000 \
--max-num-batched-tokens 4000 \
--tensor-parallel-size 8
With the ipex-llm docker container, intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2
it successfully load model in 4 ARC. But when load model in 8 ARC, it will have the following error.
root@GPU-Xeon4410Y-ARC770:/llm# bash start-vllm-service.sh /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 267, in
engine = IPEXLLMAsyncLLMEngine.from_engine_args(engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 57, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 30, in init
super().init(*args, kwargs)
File "/llm/vllm/vllm/engine/async_llm_engine.py", line 309, in init
self.engine = self._init_engine(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/engine/async_llm_engine.py", line 409, in _init_engine
return engine_class(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/engine/llm_engine.py", line 106, in init
self.model_executor = executor_class(model_config, cache_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 77, in init
self._init_cache()
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 249, in _init_cache
num_blocks = self._run_workers(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 347, in _run_workers
driver_worker_output = getattr(self.driver_worker,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/worker.py", line 136, in profile_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/model_runner.py", line 645, in profile_run
self.execute_model(seqs, kv_caches)
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 257, in forward
hidden_states, residual = layer(
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 208, in forward
hidden_states, residual = self.input_layernorm(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/layers/layernorm.py", line 52, in forward
ops.fused_add_rms_norm(
TypeError: fused_add_rms_norm(): incompatible function arguments. The following argument types are supported:
torchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? warn( 2024-08-14 11:07:55,600 - INFO - intel_extension_for_pytorch auto imported INFO 08-14 11:07:56 api_server.py:258] vLLM API server version 0.3.3 INFO 08-14 11:07:56 api_server.py:259] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, served_model_name='Qwen1.5-7B-Chat', lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], load_in_low_bit='fp6', model='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', dtype='float16', kv_cache_dtype='auto', max_model_len=4096, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=8, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, seed=0, swap_space=4, gpu_memory_utilization=0.75, max_num_batched_tokens=10240, max_num_seqs=12, max_paddings=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='xpu', engine_use_ray=False, disable_log_requests=False, max_log_len=None) WARNING 08-14 11:07:56 config.py:710] Casting torch.bfloat16 to torch.float16. INFO 08-14 11:07:56 config.py:523] Custom all-reduce kernels are temporarily disabled due to stability issues. We will re-enable them once the issues are resolved. 2024-08-14 11:07:58,897 INFO worker.py:1788 -- Started a local Ray instance. INFO 08-14 11:07:59 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=8, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=xpu, seed=0, max_num_batched_tokens=10240, max_num_seqs=12, max_model_len=4096) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. (RayWorkerVllm pid=32282) /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? (RayWorkerVllm pid=32282) warn( (RayWorkerVllm pid=32483) 2024-08-14 11:08:17,825 - INFO - intel_extension_for_pytorch auto imported INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend. (RayWorkerVllm pid=32094) INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend. 2024-08-14 11:08:19,069 - INFO - Converting the current model to fp6 format...... 2024-08-14 11:08:19,069 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations [2024-08-14 11:08:20,124] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) (RayWorkerVllm pid=32483) 2024-08-14 11:08:20,271 - INFO - Converting the current model to fp6 format...... (RayWorkerVllm pid=32483) 2024-08-14 11:08:20,272 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations (RayWorkerVllm pid=32094) /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality fromtorchvision.io
, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you havelibjpeg
orlibpng
installed before buildingtorchvision
from source? [repeated 6x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.) (RayWorkerVllm pid=32094) warn( [repeated 6x across cluster] 2024-08-14 11:08:21,272 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations (RayWorkerVllm pid=32483) [2024-08-14 11:08:21,256] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) INFO 08-14 11:08:21 model_convert.py:249] Loading model weights took 1.0264 GB (RayWorkerVllm pid=32349) 2024-08-14 11:08:18,290 - INFO - intel_extension_for_pytorch auto imported [repeated 6x across cluster] (RayWorkerVllm pid=32551) 2024-08-14 11:08:20,708 - INFO - Converting the current model to fp6 format...... [repeated 6x across cluster] (RayWorkerVllm pid=32483) 2024-08-14 11:08:25,761 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations [repeated 7x across cluster] (RayWorkerVllm pid=32483) INFO 08-14 11:08:26 model_convert.py:249] Loading model weights took 1.0264 GB (RayWorkerVllm pid=32551) INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend. [repeated 6x across cluster] (RayWorkerVllm pid=32551) [2024-08-14 11:08:21,778] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [repeated 6x across cluster] 2024:08:14-11:08:27:(28904) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL 2024:08:14-11:08:27:(28904) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher (RayWorkerVllm pid=32094) 2024:08:14-11:08:28:(32094) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL (RayWorkerVllm pid=32094) 2024:08:14-11:08:28:(32094) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher 2024:08:14-11:08:29:(33884) |CCL_WARN| no membind support for NUMA node 1, skip thread membind 2024:08:14-11:08:29:(33896) |CCL_WARN| no membind support for NUMA node 1, skip thread membind (RayWorkerVllm pid=32094) 2024:08:14-11:08:29:(33886) |CCL_WARN| no membind support for NUMA node 1, skip thread membind (RayWorkerVllm pid=32094) 2024:08:14-11:08:29:(33892) |CCL_WARN| no membind support for NUMA node 1, skip thread membind 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices 2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices (RayWorkerVllm pid=32162) INFO 08-14 11:08:27 model_convert.py:249] Loading model weights took 1.0264 GB [repeated 6x across cluster] Traceback (most recent call last): File "Invoked with: (tensor([[[-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284], [-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284], [-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284], ..., [-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284], [-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284], [-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284]],
(RayWorkerVllm pid=32551) 2024:08:14-11:08:28:(32551) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL [repeated 6x across cluster] (RayWorkerVllm pid=32551) 2024:08:14-11:08:28:(32551) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher [repeated 6x across cluster] (RayWorkerVllm pid=32551) 2024:08:14-11:08:29:(33894) |CCL_WARN| no membind support for NUMA node 0, skip thread membind [repeated 12x across cluster] (RayWorkerVllm pid=32551) 2024:08:14-11:08:32:(32551) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 728x across cluster] (RayWorkerVllm pid=32162) 2024-08-14 11:08:27,278 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations [repeated 6x across cluster]