triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.49k forks source link

Triton crashes with SIGSEGV (signal 11) #7472

Open JindrichD opened 3 months ago

JindrichD commented 3 months ago

Description Triton receives SIGSEGV during handling the traffic. Last thing that it wrote out was E0723 11:57:36.328641 1 infer_handler.h:187] ""[INTERNAL] Attempting to access current response when it is not ready

When used debug build the back trace looks like this:

(gdb) bt
#0  0x00005608ea663bc4 in grpc::GenericSerialize<grpc::ProtoBufferWriter, inference::ModelStreamInferResponse> (msg=..., bb=0x7efc5be3eb48, own_buffer=0x7efc5cbca16f)
    at /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/proto_utils.h:54
#1  0x00005608ea662fd2 in grpc::SerializationTraits<inference::ModelStreamInferResponse, void>::Serialize (msg=..., bb=0x7efc5be3eb48, own_buffer=0x7efc5cbca16f)
    at /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/proto_utils.h:109
#2  0x00005608ea662518 in grpc::internal::CallOpSendMessage::SendMessage<inference::ModelStreamInferResponse> (this=0x7efc5be3eb38, message=..., options=...)
    at /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/call_op_set.h:390
#3  0x00005608ea66150f in grpc::internal::CallOpSendMessage::SendMessage<inference::ModelStreamInferResponse> (this=0x7efc5be3eb38, message=...) at /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/call_op_set.h:400
#4  0x00005608ea6605c8 in grpc::ServerAsyncReaderWriter<inference::ModelStreamInferResponse, inference::ModelInferRequest>::Write (this=0x7efc5be3e800, msg=..., tag=0x7efc5bee1200)
    at /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/async_stream.h:1046
#5  0x00005608ea65e793 in triton::server::grpc::InferHandlerState<grpc::ServerAsyncReaderWriter<inference::ModelStreamInferResponse, inference::ModelInferRequest>, inference::ModelInferRequest, inference::ModelStreamInferResponse>::Context::DecoupledWriteResponse (this=0x7efc5da02bd0, state=0x7efc5bee1200) at /workspace/src/grpc/infer_handler.h:868
#6  0x00005608ea659c2c in triton::server::grpc::ModelStreamInferHandler::Process (this=0x7f0685387e00, state=0x7efc5bee1200, rpc_ok=true) at /workspace/src/grpc/stream_infer_handler.cc:516
#7  0x00005608ea6251c1 in _ZZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS4_27WithAsyncMethod_ServerReadyINS4_26WithAsyncMethod_ModelReadyINS4_30WithAsyncMethod_ServerMetadataINS4_29WithAsyncMethod_ModelMetadataINS4_26WithAsyncMethod_ModelInferINS4_32WithAsyncMethod_ModelStreamInferINS4_27WithAsyncMethod_ModelConfigINS4_31WithAsyncMethod_ModelStatisticsINS4_31WithAsyncMethod_RepositoryIndexINS4_35WithAsyncMethod_RepositoryModelLoadINS4_37WithAsyncMethod_RepositoryModelUnloadINS4_40WithAsyncMethod_SystemSharedMemoryStatusINS4_42WithAsyncMethod_SystemSharedMemoryRegisterINS4_44WithAsyncMethod_SystemSharedMemoryUnregisterINS4_38WithAsyncMethod_CudaSharedMemoryStatusINS4_40WithAsyncMethod_CudaSharedMemoryRegisterINS4_42WithAsyncMethod_CudaSharedMemoryUnregisterINS4_28WithAsyncMethod_TraceSettingINS4_27WithAsyncMethod_LogSettingsINS4_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc23ServerAsyncReaderWriterINS3_24ModelStreamInferResponseENS3_17ModelInferRequestEEES1D_S1C_E5StartEvENKUlvE_clEv (__closure=0x7f06786d11e8) at /workspace/src/grpc/infer_handler.h:1316
#8  0x00005608ea63aa84 in _ZSt13__invoke_implIvZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS5_27WithAsyncMethod_ServerReadyINS5_26WithAsyncMethod_ModelReadyINS5_30WithAsyncMethod_ServerMetadataINS5_29WithAsyncMethod_ModelMetadataINS5_26WithAsyncMethod_ModelInferINS5_32WithAsyncMethod_ModelStreamInferINS5_27WithAsyncMethod_ModelConfigINS5_31WithAsyncMethod_ModelStatisticsINS5_31WithAsyncMethod_RepositoryIndexINS5_35WithAsyncMethod_RepositoryModelLoadINS5_37WithAsyncMethod_RepositoryModelUnloadINS5_40WithAsyncMethod_SystemSharedMemoryStatusINS5_42WithAsyncMethod_SystemSharedMemoryRegisterINS5_44WithAsyncMethod_SystemSharedMemoryUnregisterINS5_38WithAsyncMethod_CudaSharedMemoryStatusINS5_40WithAsyncMethod_CudaSharedMemoryRegisterINS5_42WithAsyncMethod_CudaSharedMemoryUnregisterINS5_28WithAsyncMethod_TraceSettingINS5_27WithAsyncMethod_LogSettingsINS5_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc23ServerAsyncReaderWriterINS4_24ModelStreamInferResponseENS4_17ModelInferRequestEEES1E_S1D_E5StartEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ (__f=...) at /usr/include/c++/11/bits/invoke.h:61
...

The problem is that on frame 5 the response_queue is empty and state->response_queue_->GetCurrentResponse() returns null pointer which is then attempted to be written, here: https://github.com/triton-inference-server/server/blob/r24.06/src/grpc/infer_handler.h#L867

I was not able to get a handle on how the server works internally yet, but clearly it seems that the code expects a message in a response queue which should be sent away, but the queue is empty. (When i tried to fix a code in a way that write a message only if it was not null then the server did not crash, but client did not receive the response it was waiting for.) Any hints how to debug it better or where to focus would be much appreciated.

Triton Information v24.06

Are you using the Triton container or did you build it yourself? we built ourselves, either by running python3 compose.py ... or python3 build.py --build-type=Debug ... scripts from the triton repo

To Reproduce We were running one ensemble model (consists of two models internally) which is not decoupled (i.e. is request-response based) and one decoupled model where we send many request and receive one response back at the end of session. From where server crashed it looks like the decoupled model causes a trouble?

I was running inferences from several triton clients (around 10) at the same time. Clients were using grpc streams and were inferencing as ensemble as the decoupled model. Each of them was running different amount of requests, then disconnected and started from beginning. It takes some time, the crash is not immediate, sometimes it takes hours. So looks like some race condition.

Expected behavior Does not crash.

rmccorm4 commented 3 months ago

Hi @JindrichD, thanks for sharing such a detailed issue. Can you try to reproduce this on the 24.07 release? There were recently some changes to how responses are written for decoupled models, and it's possible this may resolve the issue you are seeing: https://github.com/triton-inference-server/server/pull/7404.

CC @kthui for viz

JindrichD commented 3 months ago

Hey, unfortunately 24.07 crashed as well, i did not do the debug build (yet), so i don't have full backtrace, nevertheless the release version crashed for me.

RamonPessoa commented 2 months ago

I built r24.07 with the build.py to use debug flag and used gdb and got this:

#0  0x00005c9ffbc0467c in triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::Context::IsCancelled (this=0x0)
    at /workspace/src/grpc/infer_handler.h:669
#1  0x00005c9ffbc00b8a in triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::IsGrpcContextCancelled (this=0x7332757b7f70)
    at /workspace/src/grpc/infer_handler.h:1039
#2  0x00005c9ffbbfa931 in triton::server::grpc::ModelInferHandler::Process (this=0x5c9ffe984cf0, state=0x7332757b7f70, rpc_ok=true) at /workspace/src/grpc/infer_handler.cc:706
#3  0x00005c9ffbbe1663 in _ZZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS4_27WithAsyncMethod_ServerReadyINS4_26WithAsyncMethod_ModelReadyINS4_30WithAsyncMethod_ServerMetadataINS4_29WithAsyncMethod_ModelMetadataINS4_26WithAsyncMethod_ModelInferINS4_32WithAsyncMethod_ModelStreamInferINS4_27WithAsyncMethod_ModelConfigINS4_31WithAsyncMethod_ModelStatisticsINS4_31WithAsyncMethod_RepositoryIndexINS4_35WithAsyncMethod_RepositoryModelLoadINS4_37WithAsyncMethod_RepositoryModelUnloadINS4_40WithAsyncMethod_SystemSharedMemoryStatusINS4_42WithAsyncMethod_SystemSharedMemoryRegisterINS4_44WithAsyncMethod_SystemSharedMemoryUnregisterINS4_38WithAsyncMethod_CudaSharedMemoryStatusINS4_40WithAsyncMethod_CudaSharedMemoryRegisterINS4_42WithAsyncMethod_CudaSharedMemoryUnregisterINS4_28WithAsyncMethod_TraceSettingINS4_27WithAsyncMethod_LogSettingsINS4_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS3_18ModelInferResponseEEENS3_17ModelInferRequestES1C_E5StartEvENKUlvE_clEv (__closure=0x5c9ffe9e4a98) at /workspace/src/grpc/infer_handler.h:1316
#4  0x00005c9ffbbf59a5 in _ZSt13__invoke_implIvZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS5_27WithAsyncMethod_ServerReadyINS5_26WithAsyncMethod_ModelReadyINS5_30WithAsyncMethod_ServerMetadataINS5_29WithAsyncMethod_ModelMetadataINS5_26WithAsyncMethod_ModelInferINS5_32WithAsyncMethod_ModelStreamInferINS5_27WithAsyncMethod_ModelConfigINS5_31WithAsyncMethod_ModelStatisticsINS5_31WithAsyncMethod_RepositoryIndexINS5_35WithAsyncMethod_RepositoryModelLoadINS5_37WithAsyncMethod_RepositoryModelUnloadINS5_40WithAsyncMethod_SystemSharedMemoryStatusINS5_42WithAsyncMethod_SystemSharedMemoryRegisterINS5_44WithAsyncMethod_SystemSharedMemoryUnregisterINS5_38WithAsyncMethod_CudaSharedMemoryStatusINS5_40WithAsyncMethod_CudaSharedMemoryRegisterINS5_42WithAsyncMethod_CudaSharedMemoryUnregisterINS5_28WithAsyncMethod_TraceSettingINS5_27WithAsyncMethod_LogSettingsINS5_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS4_18ModelInferResponseEEENS4_17ModelInferRequestES1D_E5StartEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#5  0x00005c9ffbbf5901 in _ZSt8__invokeIZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS5_27WithAsyncMethod_ServerReadyINS5_26WithAsyncMethod_ModelReadyINS5_30WithAsyncMethod_ServerMetadataINS5_29WithAsyncMethod_ModelMetadataINS5_26WithAsyncMethod_ModelInferINS5_32WithAsyncMethod_ModelStreamInferINS5_27WithAsyncMethod_ModelConfigINS5_31WithAsyncMethod_ModelStatisticsINS5_31WithAsyncMethod_RepositoryIndexINS5_35WithAsyncMethod_RepositoryModelLoadINS5_37WithAsyncMethod_RepositoryModelUnloadINS5_40WithAsyncMethod_SystemSharedMemoryStatusINS5_42WithAsyncMethod_SystemSharedMemoryRegisterINS5_44WithAsyncMethod_SystemSharedMemoryUnregisterINS5_38WithAsyncMethod_CudaSharedMemoryStatusINS5_40WithAsyncMethod_CudaSharedMemoryRegisterINS5_42WithAsyncMethod_CudaSharedMemoryUnregisterINS5_28WithAsyncMethod_TraceSettingINS5_27WithAsyncMethod_LogSettingsINS5_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS4_18ModelInferResponseEEENS4_17ModelInferRequestES1D_E5StartEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS1J_DpOS1K_ (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#6  0x00005c9ffbbf5872 in _ZNSt6thread8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS7_27WithAsyncMethod_ServerReadyINS7_26WithAsyncMethod_ModelReadyINS7_30WithAsyncMethod_ServerMetadataINS7_29WithAsyncMethod_ModelMetadataINS7_26WithAsyncMethod_ModelInferINS7_32WithAsyncMethod_ModelStreamInferINS7_27WithAsyncMethod_ModelConfigINS7_31WithAsyncMethod_ModelStatisticsINS7_31WithAsyncMethod_RepositoryIndexINS7_35WithAsyncMethod_RepositoryModelLoadINS7_37WithAsyncMethod_RepositoryModelUnloadINS7_40WithAsyncMethod_SystemSharedMemoryStatusINS7_42WithAsyncMethod_SystemSharedMemoryRegisterINS7_44WithAsyncMethod_SystemSharedMemoryUnregisterINS7_38WithAsyncMethod_CudaSharedMemoryStatusINS7_40WithAsyncMethod_CudaSharedMemoryRegisterINS7_42WithAsyncMethod_CudaSharedMemoryUnregisterINS7_28WithAsyncMethod_TraceSettingINS7_27WithAsyncMethod_LogSettingsINS7_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS6_18ModelInferResponseEEENS6_17ModelInferRequestES1F_E5StartEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE (this=0x5c9ffe9e4a98) at /usr/include/c++/11/bits/std_thread.h:259
#7  0x00005c9ffbbf5822 in _ZNSt6thread8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS7_27WithAsyncMethod_ServerReadyINS7_26WithAsyncMethod_ModelReadyINS7_30WithAsyncMethod_ServerMetadataINS7_29WithAsyncMethod_ModelMetadataINS7_26WithAsyncMethod_ModelInferINS7_32WithAsyncMethod_ModelStreamInferINS7_27WithAsyncMethod_ModelConfigINS7_31WithAsyncMethod_ModelStatisticsINS7_31WithAsyncMethod_RepositoryIndexINS7_35WithAsyncMethod_RepositoryModelLoadINS7_37WithAsyncMethod_RepositoryModelUnloadINS7_40WithAsyncMethod_SystemSharedMemoryStatusINS7_42WithAsyncMethod_SystemSharedMemoryRegisterINS7_44WithAsyncMethod_SystemSharedMemoryUnregisterINS7_38WithAsyncMethod_CudaSharedMemoryStatusINS7_40WithAsyncMethod_CudaSharedMemoryRegisterINS7_42WithAsyncMethod_CudaSharedMemoryUnregisterINS7_28WithAsyncMethod_TraceSettingINS7_27WithAsyncMethod_LogSettingsINS7_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS6_18ModelInferResponseEEENS6_17ModelInferRequestES1F_E5StartEvEUlvE_EEEclEv (this=0x5c9ffe9e4a98) at /usr/include/c++/11/bits/std_thread.h:266
#8  0x00005c9ffbbf57de in _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS8_27WithAsyncMethod_ServerReadyINS8_26WithAsyncMethod_ModelReadyINS8_30WithAsyncMethod_ServerMetadataINS8_29WithAsyncMethod_ModelMetadataINS8_26WithAsyncMethod_ModelInferINS8_32WithAsyncMethod_ModelStreamInferINS8_27WithAsyncMethod_ModelConfigINS8_31WithAsyncMethod_ModelStatisticsINS8_31WithAsyncMethod_RepositoryIndexINS8_35WithAsyncMethod_RepositoryModelLoadINS8_37WithAsyncMethod_RepositoryModelUnloadINS8_40WithAsyncMethod_SystemSharedMemoryStatusINS8_42WithAsyncMethod_SystemSharedMemoryRegisterINS8_44WithAsyncMethod_SystemSharedMemoryUnregisterINS8_38WithAsyncMethod_CudaSharedMemoryStatusINS8_40WithAsyncMethod_CudaSharedMemoryRegisterINS8_42WithAsyncMethod_CudaSharedMemoryUnregisterINS8_28WithAsyncMethod_TraceSettingINS8_27WithAsyncMethod_LogSettingsINS8_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS7_18ModelInferResponseEEENS7_17ModelInferRequestES1G_E5StartEvEUlvE_EEEEE6_M_runEv (this=0x5c9ffe9e4a90) at /usr/include/c++/11/bits/std_thread.h:211
#9  0x000073347eab0253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000073347d87dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x000073347d90ea04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
pskiran1 commented 1 month ago

@JindrichD, @RamonPessoa, We have addressed this issue in the 24.09 release. Please try the latest Tritonserver 24.09 and let us know if the issue persists. PR: https://github.com/triton-inference-server/server/pull/7617