triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.84k stars 1.43k forks source link

triton malloc fail #7308

Open MouseSun846 opened 2 months ago

MouseSun846 commented 2 months ago

Description Triton crashes during runtime。

(gdb) info stack
#0  0x00007ffff64e4d8b in _int_malloc (av=av@entry=0x7ffe30000020, bytes=bytes@entry=24) at malloc.c:3608
#1  0x00007ffff64e7299 in __GI___libc_malloc (bytes=24) at malloc.c:3066
#2  0x00007ffff6855b39 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fffe05780e0 in triton::backend::python::PbTensor::LoadFromSharedMemory(std::unique_ptr<triton::backend::python::SharedMemoryManager, std::default_delete<triton::backend::python::SharedMemoryManager> >&, long, bool) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#4  0x00007fffe056c8c5 in triton::backend::python::InferRequest::LoadFromSharedMemory(std::unique_ptr<triton::backend::python::SharedMemoryManager, std::default_delete<triton::backend::python::SharedMemoryManager> >&, long, bool) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#5  0x00007fffe0530060 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr<triton::backend::python::IPCMessage>, bool) () from /opt/tritonserver/backends/python/libtriton_python.so
#6  0x00007fffe05310bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so
#7  0x00007fffe0537cfd in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#8  0x00007ffff79684df in __pthread_once_slow (once_control=0x7ffb8c00a538, init_routine=0x7ffff6880c20 <__once_proxy>) at pthread_once.c:116
#9  0x00007fffe0521447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run() ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#10 0x00007fffe054183c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so
#11 0x00007fffe053f548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so
#12 0x00007fffe053faad in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() () from /opt/tritonserver/backends/python/libtriton_python.so
#13 0x00007fffe0535c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so
#14 0x00007ffff795f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007ffff656c353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) f 2
#2  0x00007ffff6855b39 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) info frame
Stack level 2, frame at 0x7ffe967f7f00:
 rip = 0x7ffff6855b39 in operator new(unsigned long); saved rip = 0x7fffe05780e0
 called by frame at 0x7ffe967f8000, caller of frame at 0x7ffe967f7ef0
 Arglist at 0x7ffe967f7ee8, args: 
 Locals at 0x7ffe967f7ee8, Previous frame's sp is 0x7ffe967f7f00
 Saved registers:
  rbx at 0x7ffe967f7ef0, rip at 0x7ffe967f7ef8

Triton Information triton infer server 23.03

How fix it?

MouseSun846 commented 1 month ago

tcmalloc crash stack

Thread 301 "tritonserver" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe937f2000 (LWP 34040)]
0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
(gdb) bt
#0  0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#1  0x00007ffff7df7350 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#2  0x00007ffff68c5ded in triton::core::DynamicBatchScheduler::Enqueue(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#3  0x00007ffff691fdd1 in triton::core::InferenceRequest::Run(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#4  0x00007ffff69d7740 in triton::core::InferenceServer::InferAsync(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#5  0x00007ffff69f3325 in TRITONSERVER_ServerInferAsync () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6  0x00007fffea17e46e in triton::backend::python::RequestExecutor::Infer(std::shared_ptr<triton::backend::python::InferRequest>&, std::shared_ptr<triton::backend::python::InferPayload>&) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#7  0x00007fffea1533d9 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr<triton::backend::python::IPCMessage>, bool) () from /opt/tritonserver/backends/python/libtriton_python.so
#8  0x00007fffea1540bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so
#9  0x00007fffea15acfd in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#10 0x00007ffff77714df in __pthread_once_slow (once_control=0x555565348c58, init_routine=0x7ffff6687c20 <__once_proxy>) at pthread_once.c:116
#11 0x00007fffea144447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run() ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#12 0x00007fffea16483c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so
#13 0x00007fffea162548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so
#14 0x00007fffea162aad in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() () from /opt/tritonserver/backends/python/libtriton_python.so
#15 0x00007fffea158c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so
#16 0x00007ffff7768609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007ffff6375353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
MouseSun846 commented 1 month ago

Thread 327 "tritonserver" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffe467fc000 (LWP 60201)] 0x00007fffe0555515 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::priv_allocate(unsigned int, unsigned long, unsigned long&, void*&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so (gdb) bt

0 0x00007fffe0555515 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::priv_allocate(unsigned int, unsigned long, unsigned long&, void*&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so

1 0x00007fffe05773cc in triton::backend::python::PbMemory::Create(std::unique_ptr<triton::backend::python::SharedMemoryManager, std::default_delete >&, TRITONSERVER_memorytype_enum, long, unsigned long, char*, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

2 0x00007fffe05776da in triton::backend::python::PbMemory::Create(std::unique_ptr<triton::backend::python::SharedMemoryManager, std::default_delete >&, std::unique_ptr<triton::backend::BackendMemory, std::default_delete >&&, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

3 0x00007fffe0530241 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

4 0x00007fffe05310bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so

5 0x00007fffe0537cfd in std::future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> ()>, bool) ()

from /opt/tritonserver/backends/python/libtriton_python.so

6 0x00007ffff79684df in pthread_once_slow (once_control=0x7ffce400adb8, init_routine=0x7ffff6880c20 <once_proxy>) at pthread_once.c:116

7 0x00007fffe0521447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run() ()

from /opt/tritonserver/backends/python/libtriton_python.so

8 0x00007fffe054183c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator, boost::asio::detail::scheduler_operation>::do_complete(void, boost::asio::detail::scheduler_operation, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so

9 0x00007fffe053f548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so

10 0x00007fffe053faad in boost::asio::detail::posix_thread::func::run() () from /opt/tritonserver/backends/python/libtriton_python.so

11 0x00007fffe0535c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so

12 0x00007ffff795f609 in start_thread (arg=) at pthread_create.c:477

13 0x00007ffff656c353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

MouseSun846 commented 1 month ago

(gdb) bt

0 0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

1 0x00007ffff7df7350 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

2 0x00007ffff69750f9 in triton::core::ModelRepositoryManager::GetModel(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, long, std::shared_ptr*) ()

from /opt/tritonserver/bin/../lib/libtritonserver.so

3 0x00007ffff69d7df6 in triton::core::InferenceServer::ModelIsReady(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, long, bool*) () from /opt/tritonserver/bin/../lib/libtritonserver.so

4 0x00007ffff69e838b in TRITONSERVER_ServerModelIsReady () from /opt/tritonserver/bin/../lib/libtritonserver.so

5 0x00007fffea17e1b1 in triton::backend::python::RequestExecutor::Infer(std::shared_ptr&, std::shared_ptr&) ()

from /opt/tritonserver/backends/python/libtriton_python.so

6 0x00007fffea1533d9 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

7 0x00007fffea1540bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so

8 0x00007fffea15acfd in std::future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> ()>, bool) ()

from /opt/tritonserver/backends/python/libtriton_python.so

9 0x00007ffff77714df in pthread_once_slow (once_control=0x555558b0bfb8, init_routine=0x7ffff6687c20 <once_proxy>) at pthread_once.c:116

10 0x00007fffea144447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run() ()

from /opt/tritonserver/backends/python/libtriton_python.so

11 0x00007fffea16483c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator, boost::asio::detail::scheduler_operation>::do_complete(void, boost::asio::detail::scheduler_operation, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so

12 0x00007fffea162548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so

13 0x00007fffea162aad in boost::asio::detail::posix_thread::func::run() () from /opt/tritonserver/backends/python/libtriton_python.so

14 0x00007fffea158c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so

15 0x00007ffff7768609 in start_thread (arg=) at pthread_create.c:477

16 0x00007ffff6375353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) f 2

2 0x00007ffff69750f9 in triton::core::ModelRepositoryManager::GetModel(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, long, std::shared_ptr*) ()

from /opt/tritonserver/bin/../lib/libtritonserver.so

MouseSun846 commented 1 month ago

0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (gdb) bt

0 0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

1 0x00007ffff7df7350 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

2 0x00007fffea15a437 in std::_Function_base::_Base_manager<std::_Bind<void (triton::backend::python::ModelInstanceState::(triton::backend::python::ModelInstanceState, std::_Placeholder<1>))(std::unique_ptr<triton::backend::python::InferResponse, std::default_delete >)> >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) () from /opt/tritonserver/backends/python/libtriton_python.so

3 0x00007fffea1533a2 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

4 0x00007fffea1540bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so

5 0x00007fffea15acfd in std::future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> ()>, bool) ()

from /opt/tritonserver/backends/python/libtriton_python.so

6 0x00007ffff77714df in pthread_once_slow (once_control=0x55555efbff68, init_routine=0x7ffff6687c20 <once_proxy>) at pthread_once.c:116

7 0x00007fffea144447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run() ()

from /opt/tritonserver/backends/python/libtriton_python.so

8 0x00007fffea16483c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator, boost::asio::detail::scheduler_operation>::do_complete(void, boost::asio::detail::scheduler_operation, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so

9 0x00007fffea162548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so

10 0x00007fffea162aad in boost::asio::detail::posix_thread::func::run() () from /opt/tritonserver/backends/python/libtriton_python.so

11 0x00007fffea158c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so

12 0x00007ffff7768609 in start_thread (arg=) at pthread_create.c:477

13 0x00007ffff6375353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) f 3

3 0x00007fffea1533a2 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr, bool) () from /opt/tritonserver/backends/python/libtriton_python.so

MouseSun846 commented 1 month ago

(gdb) bt

0 tcache_get (tc_idx=) at malloc.c:2937

1 __GI___libc_malloc (bytes=16) at malloc.c:3051

2 0x00007ffff6855b39 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6

3 0x00007fffe055d28b in void std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >::emplace_back<std::shared_ptr >(std::shared_ptr&&) () from /opt/tritonserver/backends/python/libtriton_python.so

4 0x00007fffe055c0ed in triton::backend::python::InferResponseComplete(TRITONSERVER_InferenceResponse, unsigned int, void) () from /opt/tritonserver/backends/python/libtriton_python.so

5 0x00007ffff6b26654 in triton::core::InferenceResponse::Send(std::unique_ptr<triton::core::InferenceResponse, std::default_delete >&&, unsigned int) ()

from /opt/tritonserver/bin/../lib/libtritonserver.so

6 0x00007ffff6a8d5d5 in TRITONBACKEND_ResponseSend () from /opt/tritonserver/bin/../lib/libtritonserver.so

7 0x00007fffe6064319 in triton::backend::tensorrt::ModelInstanceState::ProcessResponse() () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so

8 0x00007ffff6881de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

9 0x00007ffff795f609 in start_thread (arg=) at pthread_create.c:477

10 0x00007ffff656c353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

MouseSun846 commented 1 month ago

0x00007ffff7e070ae in tc_newarray () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (gdb) bt

0 0x00007ffff7e070ae in tc_newarray () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4

1 0x00007fffea16044d in std::_Function_base::_Base_manager<std::_Bind<void (triton::backend::python::ModelInstanceState::(triton::backend::python::ModelInstanceState, std::_Placeholder<1>))(std::unique_ptr<triton::backend::python::InferResponse, std::default_delete >)> >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) ()

from /opt/tritonserver/backends/python/libtriton_python.so

2 0x00007fffea19028f in triton::backend::python::InferPayload::InferPayload(bool, std::function<void (std::unique_ptr<triton::backend::python::InferResponse, std::default_delete >)>) () from /opt/tritonserver/backends/python/libtriton_python.so

3 0x00007fffea159386 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr, bool) ()

from /opt/tritonserver/backends/python/libtriton_python.so

4 0x00007fffea15a0bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so

5 0x00007fffea160cfd in std::future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::future_base::_Result_base::_Deleter> ()>, bool) ()

from /opt/tritonserver/backends/python/libtriton_python.so

6 0x00007ffff77744df in __pthread_once_slow (once_control=0x55555862dec8,

init_routine=0x7ffff668ac20 <__once_proxy>) at pthread_once.c:116

7 0x00007fffea14a447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator, void ()>::_M_run() () from /opt/tritonserver/backends/python/libtriton_python.so

8 0x00007fffea16a83c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator, boost::asio::detail::scheduler_operation>::do_complete(void, boost::asio::detail::scheduler_operation, boost::system::error_code const&, unsigned long) ()

from /opt/tritonserver/backends/python/libtriton_python.so

9 0x00007fffea168548 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()

from /opt/tritonserver/backends/python/libtriton_python.so

10 0x00007fffea168aad in boost::asio::detail::posix_thread::func::run() () from /opt/tritonserver/backends/python/libtriton_python.so

11 0x00007fffea15ec54 in boost_asio_detail_posix_thread_function ()

from /opt/tritonserver/backends/python/libtriton_python.so

12 0x00007ffff776b609 in start_thread (arg=) at pthread_create.c:477

13 0x00007ffff6378353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

MouseSun846 commented 1 month ago

I need you help! @Tabrizian

statiraju commented 1 month ago

@MouseSun846 can you please help us with detailed steps to repro this crash.

MouseSun846 commented 1 month ago

@MouseSun846 can you please help us with detailed steps to repro this crash. Using Python backend as the backend, requesting the Triton server to infer images for more than 20 minutes will result in the above situation.