Open MouseSun846 opened 2 months ago
tcmalloc crash stack
Thread 301 "tritonserver" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe937f2000 (LWP 34040)]
0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
(gdb) bt
#0 0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#1 0x00007ffff7df7350 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#2 0x00007ffff68c5ded in triton::core::DynamicBatchScheduler::Enqueue(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#3 0x00007ffff691fdd1 in triton::core::InferenceRequest::Run(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#4 0x00007ffff69d7740 in triton::core::InferenceServer::InferAsync(std::unique_ptr<triton::core::InferenceRequest, std::default_delete<triton::core::InferenceRequest> >&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#5 0x00007ffff69f3325 in TRITONSERVER_ServerInferAsync () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6 0x00007fffea17e46e in triton::backend::python::RequestExecutor::Infer(std::shared_ptr<triton::backend::python::InferRequest>&, std::shared_ptr<triton::backend::python::InferPayload>&) ()
from /opt/tritonserver/backends/python/libtriton_python.so
#7 0x00007fffea1533d9 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest(std::shared_ptr<triton::backend::python::IPCMessage>, bool) () from /opt/tritonserver/backends/python/libtriton_python.so
#8 0x00007fffea1540bf in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/backends/python/libtriton_python.so
#9 0x00007fffea15acfd in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
from /opt/tritonserver/backends/python/libtriton_python.so
#10 0x00007ffff77714df in __pthread_once_slow (once_control=0x555565348c58, init_routine=0x7ffff6687c20 <__once_proxy>) at pthread_once.c:116
#11 0x00007fffea144447 in std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, bool&)::{lambda()#3}, std::allocator<int>, void ()>::_M_run() ()
from /opt/tritonserver/backends/python/libtriton_python.so
#12 0x00007fffea16483c in boost::asio::detail::executor_op<boost::asio::detail::binder0<std::packaged_task<void ()> >, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so
#13 0x00007fffea162548 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /opt/tritonserver/backends/python/libtriton_python.so
#14 0x00007fffea162aad in boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() () from /opt/tritonserver/backends/python/libtriton_python.so
#15 0x00007fffea158c54 in boost_asio_detail_posix_thread_function () from /opt/tritonserver/backends/python/libtriton_python.so
#16 0x00007ffff7768609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007ffff6375353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 327 "tritonserver" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffe467fc000 (LWP 60201)] 0x00007fffe0555515 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::priv_allocate(unsigned int, unsigned long, unsigned long&, void*&, unsigned long) () from /opt/tritonserver/backends/python/libtriton_python.so (gdb) bt
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
(gdb) bt
from /opt/tritonserver/bin/../lib/libtritonserver.so
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
(gdb) f 2
from /opt/tritonserver/bin/../lib/libtritonserver.so
0x00007ffff7df6fd3 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (gdb) bt
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
(gdb) f 3
(gdb) bt
from /opt/tritonserver/bin/../lib/libtritonserver.so
0x00007ffff7e070ae in tc_newarray () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (gdb) bt
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
init_routine=0x7ffff668ac20 <__once_proxy>) at pthread_once.c:116
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
from /opt/tritonserver/backends/python/libtriton_python.so
I need you help! @Tabrizian
@MouseSun846 can you please help us with detailed steps to repro this crash.
@MouseSun846 can you please help us with detailed steps to repro this crash. Using Python backend as the backend, requesting the Triton server to infer images for more than 20 minutes will result in the above situation.
Description Triton crashes during runtime。
Triton Information triton infer server 23.03
How fix it?