triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

Triton Crash with Signal 11 while using python backend #7400

Closed burling closed 4 weeks ago

burling commented 3 months ago

Description After using the Python vllm backend, Triton crashed with signal 11. The model had been loaded and preheated for some time before the crash occurred.

Triton Information What version of Triton are you using?

Are you using the Triton container or did you build it yourself? Yes

trace info:

Signal (11) received.
 0# triton::server::(anonymous namespace)::ErrorSignalHandler(int) at triton_signal.cc:?
 1# 0x00007F2477AC8B50 in /usr/lib64/libc.so.6
 2# 0x00007F24780CE7F2 in /usr/lib64/libm.so.6
 3# 0x00007F24780CF49C in /usr/lib64/libm.so.6
 4# pow in /usr/lib64/libm.so.6
 5# grpc_core::chttp2::TransportFlowControl::PeriodicUpdate() in /opt/tritonserver/bin/tritonserver
 6# finish_bdp_ping_locked(void*, absl::lts_20220623::Status) at chttp2_transport.cc:?
 7# grpc_combiner_continue_exec_ctx() in /opt/tritonserver/bin/tritonserver
 8# grpc_core::ExecCtx::Flush() in /opt/tritonserver/bin/tritonserver
 9# end_worker(grpc_pollset*, grpc_pollset_worker*, grpc_pollset_worker**) at ev_epoll1_linux.cc:?
10# pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) at ev_epoll1_linux.cc:?
11# pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) at ev_posix.cc:?
12# grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) in /opt/tritonserver/bin/tritonserver
13# cq_next(grpc_completion_queue*, gpr_timespec, void*) at completion_queue.cc:?
14# grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) in /opt/tritonserver/bin/tritonserver
15# triton::server::grpc::InferHandler<inference::GRPCInferenceService::WithAsyncMethod_ServerLive<inference::GRPCInferenceService::WithAsyncMethod_ServerR
eady<inference::GRPCInferenceService::WithAsyncMethod_ModelReady<inference::GRPCInferenceService::WithAsyncMethod_ServerMetadatinference::GRPCInferenceService::WithAsyncMethod_ModelMetadata<inference::G
RPCInferenceService::WithAsyncMethod_ModelInfer<inference::GRPCInferenceService::WithAsyncMethod_ModelStreamInfer<inference::GRPCInferenceService::WithAsyncMethod_ModelConfig<inference::GRPCInferenceServi
ce::WithAsyncMethod_ModelStatistics<inference::GRPCInferenceService::WithAsyncMethod_RepositoryIndex<inference::GRPCInferenceService::WithAsyncMethod_RepositoryModelLoad<inference::GRPCInferenceService::W
ithAsyncMethod_RepositoryModelUnload<inference::GRPCInferenceService::WithAsyncMethod_SystemSharedMemoryStatus<inference::GRPCInferenceService::WithAsyncMethod_SystemSharedMemoryRegister<inference::GRPCIn
ferenceService::WithAsyncMethod_SystemSharedMemoryUnregister<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryStatus<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryRegist
er<inference::GRPCInferenceService::WithAsyncMethod_CudaSharedMemoryUnregister<inference::GRPCInferenceService::WithAsyncMethod_TraceSetting<inference::GRPCInferenceService::WithAsyncMethod_LogSettings<in
ference::GRPCInferenceService::Service> > > > > > > > > > > > > > > > > > > >, grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>:
:Start()::{lambda()#1}::operator()() const in /opt/tritonserver/bin/tritonserver
]
16# 0x00007F247849BB13 in /usr/lib64/libstdc++.so.6
17# 0x00007F24787761CA in /usr/lib64/libpthread.so.0
18# clone in /usr/lib64/libc.so.6
Markovvn1w commented 3 months ago

I am getting a very similar problem, however I am not sure if it is the exact same error. I also have a python decupled backend. After starting tritonserver I run stress testing which sends a lot of requests to the tritonserver. Within the first 10 minutes of testing I quite consistently get this error which completely crushes my tritonserver. Unfortunately I have a custom build of tritonserver based on 24.05, so I don't know how relevant the information is.

E0702 19:02:48.658289 148148 infer_handler.h:187] "[INTERNAL] Attempting to access current response when it is not ready"
Signal (11) received.
0.773678183555603
 0# 0x0000561EA6BD83ED in tritonserver
 1# 0x00007F6E5E5D3090 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x0000561EA6C4DBE4 in tritonserver
 3# 0x0000561EA6C4E740 in tritonserver
 4# 0x0000561EA6C46DFA in tritonserver
 5# 0x0000561EA6C31AB5 in tritonserver
 6# 0x00007F6E5E9D4793 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F6E5EB64609 in /usr/lib/x86_64-linux-gnu/libpthread.so.0
 8# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Segmentation fault (core dumped)

I assume the error occurs because of this check, however I have no clue why this is the case: https://github.com/triton-inference-server/server/blob/c61d993ff3e3608d9b76e9c89a65e5ce41497d49/src/grpc/infer_handler.h#L183-L192

sboudouk commented 2 months ago

Do you have any logs about Stub being unhealthy and being restarted prior to this Signal 11 crash ?

Tabrizian commented 4 weeks ago

Closing due to in-activity. Please provide a full reproducer in case you're still running into this issue with the latest version.