triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

[BUG] coredump when process exit triggered after TRITONSERVER_ServerDelete #443

Closed hzlushiliang closed 1 month ago

hzlushiliang commented 1 month ago

Environment

. CPU architecture: x86_64 . CPU/Host memory size: 16G . GPU properties

. Libraries

. NVIDIA driver version: 525.105.17 . OS: Centos8

Reproduction Steps

. I have a inference server developed directly based upon triton-core, with similar functionality as triton-server, but serving through another protocol, not gRPC, not Http . The process try to exit normally, TRITONSERVER_ServerDelete was invoked

Expected Behavior

. The process should exit normally, gracefully

Actual Behavior

. Coredump happened with stack below

(gdb) bt
#0  std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> >::operator++ (this=0x7ff54e0f23e8) at /usr/include/c++/9/bits/stl_list.h:303
#1  0x00007ff66729dced in std::__distance<std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > > (__first=..., __last=...) at /usr/include/c++/9/bits/stl_iterator_base_funcs.h:89
#2  0x00007ff667296ccc in std::distance<std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > > (__first=..., __last=...) at /usr/include/c++/9/bits/stl_iterator_base_funcs.h:141
#3  0x00007ff667293a91 in std::list<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem>, std::allocator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > >::_M_node_count (this=0x7ff576b5d280)
    at /usr/include/c++/9/bits/stl_list.h:658
#4  0x00007ff66729140c in std::list<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem>, std::allocator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > >::size (this=0x7ff576b5d280)
    at /usr/include/c++/9/bits/stl_list.h:1057
#5  0x00007ff66728ff8b in triton::backend::inflight_batcher_llm::WorkItemsQueue::numPendingWorkItems (this=0x7ff576b5d280) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/work_items_queue.h:90
#6  0x00007ff66728ad8d in triton::backend::inflight_batcher_llm::ModelInstanceState::get_inference_requests (this=0x7ff576b5ceb0, max_num_requests=256) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:333
#7  0x00007ff667288d18 in triton::backend::inflight_batcher_llm::ModelInstanceState::<lambda(int)>::operator()(int) const (__closure=0x7ff576f81118, max_num_requests=256)
    at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:229
#8  0x00007ff66728dd32 in std::_Function_handler<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > >(int), triton::backend::inflight_batcher_llm::ModelInstanceState::ModelInstanceState(triton::backend::inflight_batcher_llm::ModelState*, TRITONBACKEND_ModelInstance*)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x7ff54e0f27b0: 256)
    at /usr/include/c++/9/bits/std_function.h:286
#9  0x00007ff6672d1bb4 in tensorrt_llm::batch_manager::GptManager::fetchNewRequests() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#10 0x00007ff6672d3296 in tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#11 0x00007ff7e0b638a0 in std::execute_native_thread_routine (__p=0x7ff576f83760) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#12 0x00007ff7e11f62de in start_thread () from /lib64/libpthread.so.0
#13 0x00007ff7e05b5e83 in clone () from /lib64/libc.so.6

Additional Notes

As showed by above stack, it seems to be a problem of destructive misorder when process exit, detail seems to be

. When model is loading, ModelInstanceState is created, which create a GptManager member instance, code switch into libtensorrt_llm_batch_manager_static library, inside which, a new thread is created to execute func decoupled_execution_loop, again inside which, ModelInstanceState instance is referenced to invoked get_inference_requests . When process try to exit, member mWorkItemsQueue of ModelInstanceState is destructived before member mBatchManager, still the child thread is reference member mWorkItemsQueue, leading to a coredump

I manually modified the source code of tensorrtllm_backend

/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.h
@@ -124,8 +124,8 @@ private:
     std::string mModelPath;
     bool mIsDecoupled;

-    std::shared_ptr<GptManager> mBatchManager;
     std::unique_ptr<WorkItemsQueue> mWorkItemsQueue;
+    std::shared_ptr<GptManager> mBatchManager;
 };

above coredump stack is no longer seen, but new coredump stack is showing up

#0  0x00007ff898d3dd4c in PMPI_Comm_size () from /usr/lib64/openmpi/lib/libmpi.so.40
#1  0x00007ff84f3138ef in tensorrt_llm::mpi::getCommWorldSize() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#2  0x00007ff84f28ad5d in triton::backend::inflight_batcher_llm::ModelInstanceState::get_inference_requests (this=0x7ff74853e5f0, max_num_requests=256) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:329
#3  0x00007ff84f288d18 in triton::backend::inflight_batcher_llm::ModelInstanceState::<lambda(int)>::operator()(int) const (__closure=0x7ff748213888, max_num_requests=256)
    at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:229
#4  0x00007ff84f28dd32 in std::_Function_handler<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > >(int), triton::backend::inflight_batcher_llm::ModelInstanceState::ModelInstanceState(triton::backend::inflight_batcher_llm::ModelState*, TRITONBACKEND_ModelInstance*)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x7ff732b167b0: 256)
    at /usr/include/c++/9/bits/std_function.h:286
#5  0x00007ff84f2d1bb4 in tensorrt_llm::batch_manager::GptManager::fetchNewRequests() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#6  0x00007ff84f2d3296 in tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#7  0x00007ff9b52b48a0 in std::execute_native_thread_routine (__p=0x7ff748215ed0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#8  0x00007ff9b59472de in start_thread () from /lib64/libpthread.so.0
#9  0x00007ff9b4d06e83 in clone () from /lib64/libc.so.6

It seems the new coredump is happened inside mpi resources, this reminds me of a summarized problem of destructive misorder when process exit, can anybody check through this? can tensorrtllm_backend exit gracefully?

thx all.

schetlur-nv commented 1 month ago

@hzlushiliang we have completely rewritten this code path in 0.10 release to be based on the executor API. Can you please re-try in a couple of weeks when the 0.10 release is public? We would prefer not to fix an issue with the old GptManager path. Feel free to reopen this issue if you have further questions.