Open fu1996 opened 1 week ago
There are no relevant files, but I have captured the relevant error call stack logs:
INFO 10-10 00:24:11 async_llm_engine.py:174] Added request cmpl-ea7ce76a97a84141911213d6779c3f25-0. end] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 10/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 12/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 14/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IBext/0/GDRDMA VM-16-5-centos:39923:78857 [0] NCCL INFO Connected NVLS tree VM-16-5-centos:39923:78857 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 VM-16-5-centos:39923:78857 [0] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer VM-16-5-centos:39923:78857 [0] NCCL INFO comm 0x56501150c320 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 3000 commId 0x1c6daa4f776f8e93 - Init COMPLETE VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 15[7] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 15[7] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 14[6] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 14[6] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 13[5] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 13[5] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 12[4] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 12[4] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 11[3] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 11[3] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 10[2] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 10[2] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 9[1] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 9[1] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 08/1 : 8[0] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared VM-16-5-centos:39923:78896 [0] NCCL INFO Channel 09/1 : 8[0] -> 0[0] [receive] via NET/IBext/0/GDRDMA/Shared [rank0]:[E1010 00:24:11.022425711 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f8eefd77f86 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f8eefd26d10 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f8ef010cf08 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f8ea1d533e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f8ea1d58600 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f8ea1d5f2ba in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f8ea1d616fc in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #7: <unknown function> + 0xdc253 (0x7f8eef4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) frame #8: <unknown function> + 0x94ac3 (0x7f8ef10a6ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #9: <unknown function> + 0x126850 (0x7f8ef1138850 in /usr/lib/x86_64-linux-gnu/libc.so.6) ERROR 10-10 00:24:11 worker_base.py:386] Error executing method execute_model. This might cause deadlock in distributed execution. ERROR 10-10 00:24:11 worker_base.py:386] Traceback (most recent call last): ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 378, in execute_method ERROR 10-10 00:24:11 worker_base.py:386] return executor(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 273, in execute_model ERROR 10-10 00:24:11 worker_base.py:386] output = self.model_runner.execute_model( ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 10-10 00:24:11 worker_base.py:386] return func(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1363, in execute_model ERROR 10-10 00:24:11 worker_base.py:386] hidden_or_intermediate_states = model_executable( ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 worker_base.py:386] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 worker_base.py:386] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 422, in forward ERROR 10-10 00:24:11 worker_base.py:386] model_output = self.model(input_ids, positions, kv_caches, ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 worker_base.py:386] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 worker_base.py:386] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 322, in forward ERROR 10-10 00:24:11 worker_base.py:386] hidden_states, residual = layer( ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 worker_base.py:386] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 worker_base.py:386] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 245, in forward ERROR 10-10 00:24:11 worker_base.py:386] hidden_states = self.self_attn( ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 worker_base.py:386] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 worker_base.py:386] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 172, in forward ERROR 10-10 00:24:11 worker_base.py:386] qkv, _ = self.qkv_proj(hidden_states) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 worker_base.py:386] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 worker_base.py:386] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 334, in forward ERROR 10-10 00:24:11 worker_base.py:386] output_parallel = self.quant_method.apply(self, input_, bias) ERROR 10-10 00:24:11 worker_base.py:386] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 122, in apply ERROR 10-10 00:24:11 worker_base.py:386] return F.linear(x, layer.weight, bias) ERROR 10-10 00:24:11 worker_base.py:386] RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)` [2024-10-10 00:24:11,686 E 39923 41512] logging.cc:108: Unhandled exception: N3c1016DistBackendErrorE. what(): [PG 3 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f8eefd77f86 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f8eefd26d10 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f8ef010cf08 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f8ea1d533e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f8ea1d58600 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f8ea1d5f2ba in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f8ea1d616fc in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #7: <unknown function> + 0xdc253 (0x7f8eef4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) frame #8: <unknown function> + 0x94ac3 (0x7f8ef10a6ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #9: <unknown function> + 0x126850 (0x7f8ef1138850 in /usr/lib/x86_64-linux-gnu/libc.so.6) Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1521 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f8eefd77f86 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0xe5aa84 (0x7f8ea19eaa84 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0xdc253 (0x7f8eef4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) frame #3: <unknown function> + 0x94ac3 (0x7f8ef10a6ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #4: <unknown function> + 0x126850 (0x7f8ef1138850 in /usr/lib/x86_64-linux-gnu/libc.so.6) ERROR 10-10 00:24:11 async_llm_engine.py:57] Engine background task failed ERROR 10-10 00:24:11 async_llm_engine.py:57] Traceback (most recent call last): ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion ERROR 10-10 00:24:11 async_llm_engine.py:57] return_value = task.result() ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop ERROR 10-10 00:24:11 async_llm_engine.py:57] result = task.result() ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step ERROR 10-10 00:24:11 async_llm_engine.py:57] request_outputs = await self.engine.step_async(virtual_engine) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 254, in step_async ERROR 10-10 00:24:11 async_llm_engine.py:57] output = await self.model_executor.execute_model_async( ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 470, in execute_model_async ERROR 10-10 00:24:11 async_llm_engine.py:57] return await super().execute_model_async(execute_model_req) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/distributed_gpu_executor.py", line 175, in execute_model_async ERROR 10-10 00:24:11 async_llm_engine.py:57] return await self._driver_execute_model_async(execute_model_req) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 486, in _driver_execute_model_async ERROR 10-10 00:24:11 async_llm_engine.py:57] return await self.driver_exec_method("execute_model", ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run ERROR 10-10 00:24:11 async_llm_engine.py:57] result = self.fn(*self.args, **self.kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 387, in execute_method ERROR 10-10 00:24:11 async_llm_engine.py:57] raise e ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 378, in execute_method ERROR 10-10 00:24:11 async_llm_engine.py:57] return executor(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 273, in execute_model ERROR 10-10 00:24:11 async_llm_engine.py:57] output = self.model_runner.execute_model( ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 10-10 00:24:11 async_llm_engine.py:57] return func(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1363, in execute_model ERROR 10-10 00:24:11 async_llm_engine.py:57] hidden_or_intermediate_states = model_executable( ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 422, in forward ERROR 10-10 00:24:11 async_llm_engine.py:57] model_output = self.model(input_ids, positions, kv_caches, ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 322, in forward ERROR 10-10 00:24:11 async_llm_engine.py:57] hidden_states, residual = layer( ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 245, in forward ERROR 10-10 00:24:11 async_llm_engine.py:57] hidden_states = self.self_attn( ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 172, in forward ERROR 10-10 00:24:11 async_llm_engine.py:57] qkv, _ = self.qkv_proj(hidden_states) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return self._call_impl(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl ERROR 10-10 00:24:11 async_llm_engine.py:57] return forward_call(*args, **kwargs) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 334, in forward ERROR 10-10 00:24:11 async_llm_engine.py:57] output_parallel = self.quant_method.apply(self, input_, bias) ERROR 10-10 00:24:11 async_llm_engine.py:57] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 122, in apply ERROR 10-10 00:24:11 async_llm_engine.py:57] return F.linear(x, layer.weight, bias) ERROR 10-10 00:24:11 async_llm_engine.py:57] RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)` ERROR:asyncio:Exception in callback _log_task_completion(error_callback=<bound method...7f8d7429f970>>)(<Task finishe...TENSOR_OP)`')>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37 handle: <Handle _log_task_completion(error_callback=<bound method...7f8d7429f970>>)(<Task finishe...TENSOR_OP)`')>) at /usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py:37> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 47, in _log_task_completion return_value = task.result() File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 642, in run_engine_loop result = task.result() File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 585, in engine_step request_outputs = await self.engine.step_async(virtual_engine) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 254, in step_async output = await self.model_executor.execute_model_async( File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 470, in execute_model_async return await super().execute_model_async(execute_model_req) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/distributed_gpu_executor.py", line 175, in execute_model_async return await self._driver_execute_model_async(execute_model_req) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 486, in _driver_execute_model_async return await self.driver_exec_method("execute_model", File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 387, in execute_method raise e File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 378, in execute_method return executor(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 273, in execute_model output = self.model_runner.execute_model( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1363, in execute_model hidden_or_intermediate_states = model_executable( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 422, in forward model_output = self.model(input_ids, positions, kv_caches, File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 322, in forward hidden_states, residual = layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 245, in forward hidden_states = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 172, in forward qkv, _ = self.qkv_proj(hidden_states) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 334, in forward output_parallel = self.quant_method.apply(self, input_, bias) File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/linear.py", line 122, in apply return F.linear(x, layer.weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)` The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 59, in _log_task_completion raise AsyncEngineDeadError( vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for theactual cause. INFO 10-10 00:24:11 async_llm_engine.py:181] Aborted request cmpl-7179337ad6fe4efaa13236d16aa59ec1-0. INFO 10-10 00:24:11 async_llm_engine.py:181] Aborted request cmpl-5b9eabe0d8dc43178d4f3bb359d17737-0. INFO 10-10 00:24:11 async_llm_engine.py:181] Aborted request cmpl-ea7ce76a97a84141911213d6779c3f25-0. ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__ await wrap(partial(self.listen_for_disconnect, receive)) File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap await func() File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive await self.message_event.wait() File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f86df6e2fe0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 754, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app await response(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__ async with anyio.create_task_group() as task_group: File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__ raise BaseExceptionGroup( exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 265, in __call__ await wrap(partial(self.listen_for_disconnect, receive)) File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 261, in wrap await func() File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive await self.message_event.wait() File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f86df6e3c40 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 754, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 774, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 295, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 75, in app await response(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 258, in __call__ async with anyio.create_task_group() as task_group: File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__ raise BaseExceptionGroup( exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) [2024-10-10 00:24:11,693 E 39923 41512] logging.cc:115: Stack trace: /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x107b84a) [0x7f8d86daf84a] ray::operator<<() /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0x107ead2) [0x7f8d86db2ad2] ray::TerminateHandler() /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f8eef48220c] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f8eef482277] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7f8eef4821fe] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xe5ab35) [0x7f8ea19eab35] c10d::ProcessGroupNCCL::ncclCommWatchdog() /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f8eef4b0253] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f8ef10a6ac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f8ef1138850] *** SIGABRT received at time=1728491051 on cpu 150 *** PC: @ 0x7f8ef10a89fc (unknown) pthread_kill @ 0x7f8ef1054520 (unknown) (unknown) [2024-10-10 00:24:11,693 E 39923 41512] logging.cc:440: *** SIGABRT received at time=1728491051 on cpu 150 *** [2024-10-10 00:24:11,693 E 39923 41512] logging.cc:440: PC: @ 0x7f8ef10a89fc (unknown) pthread_kill [2024-10-10 00:24:11,693 E 39923 41512] logging.cc:440: @ 0x7f8ef1054520 (unknown) (unknown) Fatal Python error: Aborted Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, psutil._psutil_linux, psutil._psutil_posix, sentencepiece._sentencepiece, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, cython.cimports.libc.math, PIL._imaging, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, _cffi_backend, pyarrow._parquet, pyarrow._fs, pyarrow._hdfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._json, markupsafe._speedups, zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils (total: 109) INFO 10-10 00:24:11 logger.py:36] Received request cmpl-5c3567835cf143ce89abfb6abde149e7-0: prompt: '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n阅读下面的CONTEXT,并完成TASK<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n<CONTEXT>\n SAIGroup appointed Michael Healy as senior investment partner to focus on healthcare and other sectors. Healy has over a decade of experience investing in high-growth healthcare companies, leading $4 billion+ in transactions.\n</CONTEXT>\n\n<TASK>\n请抽取上面文段中的所有适应症名称,比如肺癌、胃癌、结直肠癌、脑胶质瘤、NHL、特应性皮炎、糖尿病、肥胖等,以列表的形式返回\n- disease_list=["适应症名1", "适应症名2", "适应症名3", ...]\n</TASK>\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n- disease_list=', params: SamplingParams(n=1, best_of=1, presence_penalty=2.0, frequency_penalty=0.2, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['<|eot_id|>'], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=800, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [128000, 128000, 128006, 9125, 128007, 271, 108414, 17297, 115070, 99465, 91495, 61648, 66913, 128009, 128006, 882, 128007, 271, 27, 99465, 397, 16998, 1953, 896, 21489, 8096, 1283, 5893, 439, 10195, 9341, 8427, 311, 5357, 389, 18985, 323, 1023, 26593, 13, 1283, 5893, 706, 927, 264, 13515, 315, 3217, 26012, 304, 1579, 2427, 19632, 18985, 5220, 11, 6522, 400, 19, 7239, 10, 304, 14463, 627, 524, 99465, 1363, 3203, 7536, 397, 15225, 116602, 18655, 17905, 28190, 17161, 38574, 105363, 56438, 108562, 51611, 111571, 31091, 126900, 30624, 57942, 118, 23706, 234, 5486, 91939, 225, 23706, 234, 5486, 37985, 74245, 57942, 254, 23706, 234, 5486, 108851, 123199, 103706, 114431, 97, 5486, 45, 13793, 5486, 66378, 51611, 34171, 105871, 114052, 5486, 117587, 126017, 103429, 5486, 117178, 91939, 244, 50667, 105610, 45277, 9554, 115707, 32626, 198, 12, 8624, 2062, 29065, 108562, 51611, 111571, 13372, 16, 498, 330, 108562, 51611, 111571, 13372, 17, 498, 330, 108562, 51611, 111571, 13372, 18, 498, 2564, 933, 524, 66913, 397, 128009, 128006, 78191, 128007, 271, 12, 8624, 2062, 28], lora_request: None, prompt_adapter_request: None. /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
MODEL_PATH="/data/ckpts/405B-instruct" nohup python -m vllm.entrypoints.openai.api_server --model $MODEL_PATH --swap-space 32 --tensor-parallel-size 16 --served-model-name llama3-1-405B --host 0.0.0.0 --port 8081 --max-num-seqs 1024 --max-num-batched-tokens 8192 --gpu-memory-utilization 0.9 --enforce-eager >> /tmp/model_server_api_pre.log 2>&1 &
pip install nvidia-cublas-cu12==12.4.5.8
Your current environment
The output of `python collect_env.py`
```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.28.3 Libc version: glibc-2.35 Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.4.119-19.0009.28-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.4.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA H20 GPU 1: NVIDIA H20 GPU 2: NVIDIA H20 GPU 3: NVIDIA H20 GPU 4: NVIDIA H20 GPU 5: NVIDIA H20 GPU 6: NVIDIA H20 GPU 7: NVIDIA H20 Nvidia driver version: 535.161.07 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.0.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.0.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: 架构: x86_64 CPU 运行模式: 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual 字节序: Little Endian CPU: 384 在线 CPU 列表: 0-383 厂商 ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. 型号名称: AMD EPYC 9K84 96-Core Processor BIOS Model name: AMD EPYC 9K84 96-Core Processor CPU 系列: 25 型号: 17 每个核的线程数: 2 每个座的核数: 96 座: 2 步进: 1 Frequency boost: enabled CPU 最大 MHz: 2600.0000 CPU 最小 MHz: 1500.0000 BogoMIPS: 5200.25 标记: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d 虚拟化: AMD-V L1d 缓存: 6 MiB (192 instances) L1i 缓存: 6 MiB (192 instances) L2 缓存: 192 MiB (192 instances) L3 缓存: 768 MiB (24 instances) NUMA 节点: 2 NUMA 节点0 CPU: 0-95,192-287 NUMA 节点1 CPU: 96-191,288-383 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.24.4 [pip3] nvidia-cublas-cu12==12.3.4.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-dali-cuda120==1.35.0 [pip3] nvidia-ml-py==12.560.30 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.6.68 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] nvidia-pyindex==1.0.9 [pip3] onnx==1.15.0rc2 [pip3] optree==0.10.0 [pip3] pynvml==11.4.1 [pip3] pytorch-quantization==2.1.2 [pip3] pytorch-triton==2.2.0+e28a256d7 [pip3] pyzmq==25.1.2 [pip3] torch==2.4.0 [pip3] torch-tensorrt==2.3.0a0 [pip3] torchdata==0.7.1a0 [pip3] torchtext==0.17.0a0 [pip3] torchvision==0.19.0 [pip3] transformers==4.44.2 [pip3] triton==3.0.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.5.4@4db5176d9758b720b05460c50ace3c01026eb158 vLLM Build Flags: CUDA Archs: 5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE NODE NODE SYS SYS SYS SYS 0-95,192-287 0 N/A GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE PIX PHB NODE SYS SYS SYS SYS 0-95,192-287 0 N/A GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE PHB PIX NODE SYS SYS SYS SYS 0-95,192-287 0 N/A GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE NODE PIX SYS SYS SYS SYS 0-95,192-287 0 N/A GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS PIX NODE NODE NODE 96-191,288-383 1 N/A GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS NODE PIX NODE NODE 96-191,288-383 1 N/A GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS NODE NODE PIX PHB 96-191,288-383 1 N/A GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS NODE NODE PHB PIX 96-191,288-383 1 N/A NIC0 PIX NODE NODE NODE SYS SYS SYS SYS X NODE NODE NODE SYS SYS SYS SYS NIC1 NODE PIX PHB NODE SYS SYS SYS SYS NODE X PHB NODE SYS SYS SYS SYS NIC2 NODE PHB PIX NODE SYS SYS SYS SYS NODE PHB X NODE SYS SYS SYS SYS NIC3 NODE NODE NODE PIX SYS SYS SYS SYS NODE NODE NODE X SYS SYS SYS SYS NIC4 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS X NODE NODE NODE NIC5 SYS SYS SYS SYS NODE PIX NODE NODE SYS SYS SYS SYS NODE X NODE NODE NIC6 SYS SYS SYS SYS NODE NODE PIX PHB SYS SYS SYS SYS NODE NODE X PHB NIC7 SYS SYS SYS SYS NODE NODE PHB PIX SYS SYS SYS SYS NODE NODE PHB X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks NIC Legend: NIC0: mlx5_bond_0 NIC1: mlx5_bond_1 NIC2: mlx5_bond_2 NIC3: mlx5_bond_3 NIC4: mlx5_bond_4 NIC5: mlx5_bond_5 NIC6: mlx5_bond_6 NIC7: mlx5_bond_7 ```Model Input Dumps
There are no relevant files, but I have captured the relevant error call stack logs:
🐛 Describe the bug
Before submitting a new issue...