vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.06k stars 4.72k forks source link

vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. #3338

Closed EchoShoot closed 2 days ago

EchoShoot commented 8 months ago

afol-apiserver-72b-1 | (RayWorkerVllm pid=3779) [E ProcessGroupNCCL.cpp:475] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=16487777, OpType=ALLREDUCE, NumelIn=195911680, NumelOut=19591168 0, Timeout(ms)=1800000) ran for 1800203 milliseconds before timing out.
afol-apiserver-72b-1 | (RayWorkerVllm pid=3779) [E ProcessGroupNCCL.cpp:475] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=9388506, OpType=BROADCAST, NumelIn=1, NumelOut=1, Timeout(ms)=18 00000) ran for 1800222 milliseconds before timing out.
afol-apiserver-72b-1 | [E ProcessGroupNCCL.cpp:475] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=9388514, OpType=BROADCAST, NumelIn=1, NumelOut=1, Timeout(ms)=1800000) ran for 1800872 mi lliseconds before timing out.
afol-apiserver-72b-1 | [E ProcessGroupNCCL.cpp:489] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. afol-apiserver-72b-1 | [E ProcessGroupNCCL.cpp:495] To avoid data inconsistency, we are taking the entire process down. afol-apiserver-72b-1 | [E ProcessGroupNCCL.cpp:916] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=9388514, OpType=BROADCAST, Numel In=1, NumelOut=1, Timeout(ms)=1800000) ran for 1800872 milliseconds before timing out.
afol-apiserver-72b-1 | [2024-03-11 11:35:28,021 E 1 3935] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collectiv e operation timeout: WorkNCCL(SeqNum=9388514, OpType=BROADCAST, NumelIn=1, NumelOut=1, Timeout(ms)=1800000) ran for 1800872 milliseconds before timing out. afol-apiserver-72b-1 | [2024-03-11 11:35:28,030 E 1 3935] logging.cc:104: Stack trace:
afol-apiserver-72b-1 | /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfebc9a) [0x7efd8dc6cc9a] ray::operator<<() afol-apiserver-72b-1 | /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfee3d8) [0x7efd8dc6f3d8] ray::TerminateHandler() afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7eff7dc8220c]
afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7eff7dc82277]
afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7eff7dc821fe]
afol-apiserver-72b-1 | /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xc86f5b) [0x7eff3924af5b] c10d::ProcessGroupNCCL::ncclCommWatchdog() afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7eff7dcb0253]
afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7effb9bf8ac3]
afol-apiserver-72b-1 | /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7effb9c89a04] __clone afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 | SIGABRT received at time=1710156928 on cpu 62
afol-apiserver-72b-1 | PC: @ 0x7effb9bfa9fc (unknown) pthread_kill
afol-apiserver-72b-1 | @ 0x7effb9ba6520 (unknown) (unknown)
afol-apiserver-72b-1 | [2024-03-11 11:35:28,030 E 1 3935] logging.cc:361: SIGABRT received at time=1710156928 on cpu 62 afol-apiserver-72b-1 | [2024-03-11 11:35:28,030 E 1 3935] logging.cc:361: PC: @ 0x7effb9bfa9fc (unknown) pthread_kill
afol-apiserver-72b-1 | [2024-03-11 11:35:28,030 E 1 3935] logging.cc:361: @ 0x7effb9ba6520 (unknown) (unknown)
afol-apiserver-72b-1 | Fatal Python error: Aborted
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 | Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, n umpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch. _C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._in ternal, numba.experimental.jitclass._box, markupsafe._speedups, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.l ibs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver _enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cup y_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy. _core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_bina ry, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.f lags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy.sparse._sparsetools, _cspars etools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg.flinalg, scipy.linalg. decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arp ack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sp arse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, sci py.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optim ize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpola tive, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotati on, scipy.optimize._direct, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups (total: 161) afol-apiserver-72b-1 | [failure_signal_handler.cc : 332] RAW: Signal 11 raised at PC=0x7effb9b8c898 while already in AbslFailureSignalHandler() afol-apiserver-72b-1 | SIGSEGV received at time=1710156928 on cpu 62
afol-apiserver-72b-1 | PC: @ 0x7effb9b8c898 (unknown) abort
afol-apiserver-72b-1 | @ 0x7effb9ba6520 (unknown) (unknown)
afol-apiserver-72b-1 | @ 0x7ef7f9ffe640 (unknown) (unknown)
afol-apiserver-72b-1 | [2024-03-11 11:35:28,036 E 1 3935] logging.cc:361: SIGSEGV received at time=1710156928 on cpu 62 afol-apiserver-72b-1 | [2024-03-11 11:35:28,036 E 1 3935] logging.cc:361: PC: @ 0x7effb9b8c898 (unknown) abort afol-apiserver-72b-1 | [2024-03-11 11:35:28,038 E 1 3935] logging.cc:361: @ 0x7effb9ba6520 (unknown) (unknown) afol-apiserver-72b-1 | [2024-03-11 11:35:28,040 E 1 3935] logging.cc:361: @ 0x7ef7f9ffe640 (unknown) (unknown) afol-apiserver-72b-1 | Fatal Python error: Segmentation fault
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 |
afol-apiserver-72b-1 | Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, n umpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch. _C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7efd8e528670>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7ef d843ef790>)
afol-apiserver-72b-1 | handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7efd8e528670>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7efd843ef790>)>
afol-apiserver-72b-1 | Traceback (most recent call last):
afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
afol-apiserver-72b-1 | task.result()
afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
afol-apiserver-72b-1 | has_requests_in_progress = await self.engine_step()
afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 393, in engine_step
afol-apiserver-72b-1 | request_outputs = await self.engine.step_async()
afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 189, in step_async
afol-apiserver-72b-1 | all_outputs = await self._run_workers_async(
afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 276, in _run_workers_async
afol-apiserver-72b-1 | all_outputs = await asyncio.gather(coros)
afol-apiserver-72b-1 | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
afol-apiserver-72b-1 | result = self.fn(
self.args, self.kwargs)
afol-apiserver-72b-1 | File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
afol-apiserver-72b-1 | return func(*args, *kwargs)
afol-apiserver-72b-1 | File "/workspace/vllm/worker/worker.py", line 209, in execute_model
afol-apiserver-72b-1 | broadcast_tensor_dict(data, src=0)
afol-apiserver-72b-1 | File "/workspace/vllm/model_executor/parallel_utils/communication_op.py", line 173, in broadcast_tensor_dict afol-apiserver-72b-1 | torch.distributed.broadcast_object_list([metadata_list],
afol-apiserver-72b-1 | File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper afol-apiserver-72b-1 | return func(
args,
kwargs)
afol-apiserver-72b-1 | File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2603, in broadcast_object_list afol-apiserver-72b-1 | broadcast(object_sizes_tensor, src=src, group=group)
afol-apiserver-72b-1 | File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper afol-apiserver-72b-1 | return func(*args, **kwargs)
afol-apiserver-72b-1 | File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1906, in broadcast afol-apiserver-72b-1 | work = default_pg.broadcast([tensor], opts)
afol-apiserver-72b-1 | RuntimeError: NCCL communicator was aborted on rank 0. afol-apiserver-72b-1 | The above exception was the direct cause of the following exception: afol-apiserver-72b-1 | afol-apiserver-72b-1 | afol-apiserver-72b-1 | Traceback (most recent call last): afol-apiserver-72b-1 | File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish afol-apiserver-72b-1 | raise exc afol-apiserver-72b-1 | File "/workspace/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish afol-apiserver-72b-1 | raise AsyncEngineDeadError( afol-apiserver-72b-1 | vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause. afol-apiserver-72b-1 | INFO 03-11 11:35:28 async_llm_engine.py:133] Aborted request cmpl-f1efde9ef6c843f980f3d3d6d15c1060. afol-apiserver-72b-1 | regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, markupsafe._speedups, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_binary, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups (total: 161)

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] commented 2 days ago

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!