mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.21k stars 1.58k forks source link

[Bug] InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe #2854

Closed Erxl closed 2 months ago

Erxl commented 2 months ago

🐛 Bug

To Reproduce

mlcllm) a@aserver:~$ mlc_llm serve llm/mistral-large-instruct-2407-q4f16_1 --host 192.168.1.4
[2024-08-25 13:59:31] INFO auto_device.py:88: Not found device: cuda:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:0
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:1
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:2
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:3
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:4
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:5
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:6
[2024-08-25 13:59:33] INFO auto_device.py:79: Found device: rocm:7
[2024-08-25 13:59:34] INFO auto_device.py:88: Not found device: metal:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:0
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:1
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:2
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:3
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:4
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:5
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:6
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:7
[2024-08-25 13:59:36] INFO auto_device.py:79: Found device: vulkan:8
[2024-08-25 13:59:38] INFO auto_device.py:88: Not found device: opencl:0
[2024-08-25 13:59:38] INFO auto_device.py:35: Using device: rocm:0
[2024-08-25 13:59:38] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-08-25 13:59:38] INFO jit.py:158: Using cached model lib: /home/a/.cache/mlc_llm/model_lib/cfead2d711f56e44c7fd0fa68bddd3bd.so
[2024-08-25 13:59:38] INFO engine_base.py:180: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-08-25 13:59:38] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-08-25 13:59:38] INFO engine_base.py:210: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 41729, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:687: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 41260, prefill chunk size will be set to 2048. 
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:768: The actual engine mode is "local". So max batch size is 4, max KV cache token capacity is 8192, prefill chunk size is 2048.
[13:59:41] /workspace/mlc-llm/cpp/serve/config.cc:773: Estimated total single GPU memory usage: 17995.347 MB (Parameters: 16771.148 MB. KVCache: 778.401 MB. Temporary buffer: 445.798 MB). The actual usage might be slightly larger than the estimated number.
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #0] Loading model to device: rocm:0
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #1] Loading model to device: rocm:1
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #2] Loading model to device: rocm:2
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:150: [Worker #3] Loading model to device: rocm:3
[13:59:41] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:175: Loading parameters...
[==================================================================================================>]  [885/885]
[14:01:06] /workspace/mlc-llm/cpp/multi_gpu/multi_gpu_loader.cc:203: Loading done. Time used: Loading 76.568 s Preprocessing 8.240 s.
INFO:     Started server process [15112]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     192.168.1.9:55521 - "OPTIONS /v1/chat/completions HTTP/1.1" 200 OK
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/a/miniconda3/envs/mlcllm/lib/python3.11/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 182, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 650, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 45, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 301, in mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
  File "/workspace/mlc-llm/cpp/serve/model.cc", line 642, in mlc::llm::serve::ModelImpl::AddNewSequence(long)
  File "/workspace/mlc-llm/cpp/serve/function_table.cc", line 68, in operator()
tvm.error.InternalError: Traceback (most recent call last):
  9: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  8: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:650
  7: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:45
  6: mlc::llm::serve::NewRequestPrefillActionObj::MatchPrefixCache(mlc::llm::serve::EngineState, mlc::llm::serve::BatchPrefillBaseActionObj::PrefillInput*)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:301
  5: mlc::llm::serve::ModelImpl::AddNewSequence(long)
        at /workspace/mlc-llm/cpp/serve/model.cc:642
  4: operator()
        at /workspace/mlc-llm/cpp/serve/function_table.cc:68
  3: tvm::runtime::BcastSessionObj::CallWithPacked(tvm::runtime::TVMArgs const&)
  2: tvm::runtime::ProcessSessionObj::BroadcastPacked(tvm::runtime::TVMArgs const&)
  1: tvm::support::Pipe::Write(void const*, unsigned long)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/disco/../../support/pipe.h", line 129
InternalError: Check failed: nwrite != -1 (-1 vs. -1) : Write Error: Broken pipe

Expected behavior

Environment

MasterJH5574 commented 2 months ago

On our side such “broken pipe” error sometimes happens but rather rarely. On one hand we are working on finding the cause, and on the other hand you can kill the processes and rerun the server.