[BUG] coredump when process exit triggered after TRITONSERVER_ServerDelete

Environment

. CPU architecture: x86_64 . CPU/Host memory size: 16G . GPU properties

GPU name: NVIDIA A10
GPU memory size: 21G

. Libraries

TensorRT-LLM backend branch or tag: r24.01
Versions of TensorRT: 9.2.0.5
Versions of Cudnn: 8.9.4.25
Versions of CUDA: 12.2.0_535.54.03
Container used

. NVIDIA driver version: 525.105.17 . OS: Centos8

Reproduction Steps

. I have a inference server developed directly based upon triton-core, with similar functionality as triton-server, but serving through another protocol, not gRPC, not Http . The process try to exit normally, TRITONSERVER_ServerDelete was invoked

Expected Behavior

. The process should exit normally, gracefully

Actual Behavior

. Coredump happened with stack below

(gdb) bt
#0  std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> >::operator++ (this=0x7ff54e0f23e8) at /usr/include/c++/9/bits/stl_list.h:303
#1  0x00007ff66729dced in std::__distance<std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > > (__first=..., __last=...) at /usr/include/c++/9/bits/stl_iterator_base_funcs.h:89
#2  0x00007ff667296ccc in std::distance<std::_List_const_iterator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > > (__first=..., __last=...) at /usr/include/c++/9/bits/stl_iterator_base_funcs.h:141
#3  0x00007ff667293a91 in std::list<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem>, std::allocator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > >::_M_node_count (this=0x7ff576b5d280)
    at /usr/include/c++/9/bits/stl_list.h:658
#4  0x00007ff66729140c in std::list<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem>, std::allocator<std::shared_ptr<triton::backend::inflight_batcher_llm::WorkItem> > >::size (this=0x7ff576b5d280)
    at /usr/include/c++/9/bits/stl_list.h:1057
#5  0x00007ff66728ff8b in triton::backend::inflight_batcher_llm::WorkItemsQueue::numPendingWorkItems (this=0x7ff576b5d280) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/work_items_queue.h:90
#6  0x00007ff66728ad8d in triton::backend::inflight_batcher_llm::ModelInstanceState::get_inference_requests (this=0x7ff576b5ceb0, max_num_requests=256) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:333
#7  0x00007ff667288d18 in triton::backend::inflight_batcher_llm::ModelInstanceState::<lambda(int)>::operator()(int) const (__closure=0x7ff576f81118, max_num_requests=256)
    at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:229
#8  0x00007ff66728dd32 in std::_Function_handler<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > >(int), triton::backend::inflight_batcher_llm::ModelInstanceState::ModelInstanceState(triton::backend::inflight_batcher_llm::ModelState*, TRITONBACKEND_ModelInstance*)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x7ff54e0f27b0: 256)
    at /usr/include/c++/9/bits/std_function.h:286
#9  0x00007ff6672d1bb4 in tensorrt_llm::batch_manager::GptManager::fetchNewRequests() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#10 0x00007ff6672d3296 in tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#11 0x00007ff7e0b638a0 in std::execute_native_thread_routine (__p=0x7ff576f83760) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#12 0x00007ff7e11f62de in start_thread () from /lib64/libpthread.so.0
#13 0x00007ff7e05b5e83 in clone () from /lib64/libc.so.6

Additional Notes

As showed by above stack, it seems to be a problem of destructive misorder when process exit, detail seems to be

. When model is loading, ModelInstanceState is created, which create a GptManager member instance, code switch into libtensorrt_llm_batch_manager_static library, inside which, a new thread is created to execute func decoupled_execution_loop, again inside which, ModelInstanceState instance is referenced to invoked get_inference_requests . When process try to exit, member mWorkItemsQueue of ModelInstanceState is destructived before member mBatchManager, still the child thread is reference member mWorkItemsQueue, leading to a coredump

I manually modified the source code of tensorrtllm_backend

/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.h
@@ -124,8 +124,8 @@ private:
     std::string mModelPath;
     bool mIsDecoupled;

-    std::shared_ptr<GptManager> mBatchManager;
     std::unique_ptr<WorkItemsQueue> mWorkItemsQueue;
+    std::shared_ptr<GptManager> mBatchManager;
 };

above coredump stack is no longer seen, but new coredump stack is showing up

#0  0x00007ff898d3dd4c in PMPI_Comm_size () from /usr/lib64/openmpi/lib/libmpi.so.40
#1  0x00007ff84f3138ef in tensorrt_llm::mpi::getCommWorldSize() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#2  0x00007ff84f28ad5d in triton::backend::inflight_batcher_llm::ModelInstanceState::get_inference_requests (this=0x7ff74853e5f0, max_num_requests=256) at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:329
#3  0x00007ff84f288d18 in triton::backend::inflight_batcher_llm::ModelInstanceState::<lambda(int)>::operator()(int) const (__closure=0x7ff748213888, max_num_requests=256)
    at /tmp/build/tensorrtllm_backend/inflight_batcher_llm/src/model_instance_state.cc:229
#4  0x00007ff84f28dd32 in std::_Function_handler<std::list<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::InferenceRequest> > >(int), triton::backend::inflight_batcher_llm::ModelInstanceState::ModelInstanceState(triton::backend::inflight_batcher_llm::ModelState*, TRITONBACKEND_ModelInstance*)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x7ff732b167b0: 256)
    at /usr/include/c++/9/bits/std_function.h:286
#5  0x00007ff84f2d1bb4 in tensorrt_llm::batch_manager::GptManager::fetchNewRequests() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#6  0x00007ff84f2d3296 in tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() () from /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
#7  0x00007ff9b52b48a0 in std::execute_native_thread_routine (__p=0x7ff748215ed0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#8  0x00007ff9b59472de in start_thread () from /lib64/libpthread.so.0
#9  0x00007ff9b4d06e83 in clone () from /lib64/libc.so.6

It seems the new coredump is happened inside mpi resources, this reminds me of a summarized problem of destructive misorder when process exit, can anybody check through this? can tensorrtllm_backend exit gracefully?

thx all.

triton-inference-server / tensorrtllm_backend