Open JindrichD opened 3 months ago
Hi @JindrichD, thanks for sharing such a detailed issue. Can you try to reproduce this on the 24.07 release? There were recently some changes to how responses are written for decoupled models, and it's possible this may resolve the issue you are seeing: https://github.com/triton-inference-server/server/pull/7404.
CC @kthui for viz
Hey, unfortunately 24.07 crashed as well, i did not do the debug build (yet), so i don't have full backtrace, nevertheless the release version crashed for me.
I built r24.07 with the build.py to use debug flag and used gdb and got this:
#0 0x00005c9ffbc0467c in triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::Context::IsCancelled (this=0x0)
at /workspace/src/grpc/infer_handler.h:669
#1 0x00005c9ffbc00b8a in triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::IsGrpcContextCancelled (this=0x7332757b7f70)
at /workspace/src/grpc/infer_handler.h:1039
#2 0x00005c9ffbbfa931 in triton::server::grpc::ModelInferHandler::Process (this=0x5c9ffe984cf0, state=0x7332757b7f70, rpc_ok=true) at /workspace/src/grpc/infer_handler.cc:706
#3 0x00005c9ffbbe1663 in _ZZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS4_27WithAsyncMethod_ServerReadyINS4_26WithAsyncMethod_ModelReadyINS4_30WithAsyncMethod_ServerMetadataINS4_29WithAsyncMethod_ModelMetadataINS4_26WithAsyncMethod_ModelInferINS4_32WithAsyncMethod_ModelStreamInferINS4_27WithAsyncMethod_ModelConfigINS4_31WithAsyncMethod_ModelStatisticsINS4_31WithAsyncMethod_RepositoryIndexINS4_35WithAsyncMethod_RepositoryModelLoadINS4_37WithAsyncMethod_RepositoryModelUnloadINS4_40WithAsyncMethod_SystemSharedMemoryStatusINS4_42WithAsyncMethod_SystemSharedMemoryRegisterINS4_44WithAsyncMethod_SystemSharedMemoryUnregisterINS4_38WithAsyncMethod_CudaSharedMemoryStatusINS4_40WithAsyncMethod_CudaSharedMemoryRegisterINS4_42WithAsyncMethod_CudaSharedMemoryUnregisterINS4_28WithAsyncMethod_TraceSettingINS4_27WithAsyncMethod_LogSettingsINS4_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS3_18ModelInferResponseEEENS3_17ModelInferRequestES1C_E5StartEvENKUlvE_clEv (__closure=0x5c9ffe9e4a98) at /workspace/src/grpc/infer_handler.h:1316
#4 0x00005c9ffbbf59a5 in _ZSt13__invoke_implIvZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS5_27WithAsyncMethod_ServerReadyINS5_26WithAsyncMethod_ModelReadyINS5_30WithAsyncMethod_ServerMetadataINS5_29WithAsyncMethod_ModelMetadataINS5_26WithAsyncMethod_ModelInferINS5_32WithAsyncMethod_ModelStreamInferINS5_27WithAsyncMethod_ModelConfigINS5_31WithAsyncMethod_ModelStatisticsINS5_31WithAsyncMethod_RepositoryIndexINS5_35WithAsyncMethod_RepositoryModelLoadINS5_37WithAsyncMethod_RepositoryModelUnloadINS5_40WithAsyncMethod_SystemSharedMemoryStatusINS5_42WithAsyncMethod_SystemSharedMemoryRegisterINS5_44WithAsyncMethod_SystemSharedMemoryUnregisterINS5_38WithAsyncMethod_CudaSharedMemoryStatusINS5_40WithAsyncMethod_CudaSharedMemoryRegisterINS5_42WithAsyncMethod_CudaSharedMemoryUnregisterINS5_28WithAsyncMethod_TraceSettingINS5_27WithAsyncMethod_LogSettingsINS5_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS4_18ModelInferResponseEEENS4_17ModelInferRequestES1D_E5StartEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#5 0x00005c9ffbbf5901 in _ZSt8__invokeIZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS5_27WithAsyncMethod_ServerReadyINS5_26WithAsyncMethod_ModelReadyINS5_30WithAsyncMethod_ServerMetadataINS5_29WithAsyncMethod_ModelMetadataINS5_26WithAsyncMethod_ModelInferINS5_32WithAsyncMethod_ModelStreamInferINS5_27WithAsyncMethod_ModelConfigINS5_31WithAsyncMethod_ModelStatisticsINS5_31WithAsyncMethod_RepositoryIndexINS5_35WithAsyncMethod_RepositoryModelLoadINS5_37WithAsyncMethod_RepositoryModelUnloadINS5_40WithAsyncMethod_SystemSharedMemoryStatusINS5_42WithAsyncMethod_SystemSharedMemoryRegisterINS5_44WithAsyncMethod_SystemSharedMemoryUnregisterINS5_38WithAsyncMethod_CudaSharedMemoryStatusINS5_40WithAsyncMethod_CudaSharedMemoryRegisterINS5_42WithAsyncMethod_CudaSharedMemoryUnregisterINS5_28WithAsyncMethod_TraceSettingINS5_27WithAsyncMethod_LogSettingsINS5_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS4_18ModelInferResponseEEENS4_17ModelInferRequestES1D_E5StartEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS1J_DpOS1K_ (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#6 0x00005c9ffbbf5872 in _ZNSt6thread8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS7_27WithAsyncMethod_ServerReadyINS7_26WithAsyncMethod_ModelReadyINS7_30WithAsyncMethod_ServerMetadataINS7_29WithAsyncMethod_ModelMetadataINS7_26WithAsyncMethod_ModelInferINS7_32WithAsyncMethod_ModelStreamInferINS7_27WithAsyncMethod_ModelConfigINS7_31WithAsyncMethod_ModelStatisticsINS7_31WithAsyncMethod_RepositoryIndexINS7_35WithAsyncMethod_RepositoryModelLoadINS7_37WithAsyncMethod_RepositoryModelUnloadINS7_40WithAsyncMethod_SystemSharedMemoryStatusINS7_42WithAsyncMethod_SystemSharedMemoryRegisterINS7_44WithAsyncMethod_SystemSharedMemoryUnregisterINS7_38WithAsyncMethod_CudaSharedMemoryStatusINS7_40WithAsyncMethod_CudaSharedMemoryRegisterINS7_42WithAsyncMethod_CudaSharedMemoryUnregisterINS7_28WithAsyncMethod_TraceSettingINS7_27WithAsyncMethod_LogSettingsINS7_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS6_18ModelInferResponseEEENS6_17ModelInferRequestES1F_E5StartEvEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE (this=0x5c9ffe9e4a98) at /usr/include/c++/11/bits/std_thread.h:259
#7 0x00005c9ffbbf5822 in _ZNSt6thread8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS7_27WithAsyncMethod_ServerReadyINS7_26WithAsyncMethod_ModelReadyINS7_30WithAsyncMethod_ServerMetadataINS7_29WithAsyncMethod_ModelMetadataINS7_26WithAsyncMethod_ModelInferINS7_32WithAsyncMethod_ModelStreamInferINS7_27WithAsyncMethod_ModelConfigINS7_31WithAsyncMethod_ModelStatisticsINS7_31WithAsyncMethod_RepositoryIndexINS7_35WithAsyncMethod_RepositoryModelLoadINS7_37WithAsyncMethod_RepositoryModelUnloadINS7_40WithAsyncMethod_SystemSharedMemoryStatusINS7_42WithAsyncMethod_SystemSharedMemoryRegisterINS7_44WithAsyncMethod_SystemSharedMemoryUnregisterINS7_38WithAsyncMethod_CudaSharedMemoryStatusINS7_40WithAsyncMethod_CudaSharedMemoryRegisterINS7_42WithAsyncMethod_CudaSharedMemoryUnregisterINS7_28WithAsyncMethod_TraceSettingINS7_27WithAsyncMethod_LogSettingsINS7_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS6_18ModelInferResponseEEENS6_17ModelInferRequestES1F_E5StartEvEUlvE_EEEclEv (this=0x5c9ffe9e4a98) at /usr/include/c++/11/bits/std_thread.h:266
#8 0x00005c9ffbbf57de in _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS8_27WithAsyncMethod_ServerReadyINS8_26WithAsyncMethod_ModelReadyINS8_30WithAsyncMethod_ServerMetadataINS8_29WithAsyncMethod_ModelMetadataINS8_26WithAsyncMethod_ModelInferINS8_32WithAsyncMethod_ModelStreamInferINS8_27WithAsyncMethod_ModelConfigINS8_31WithAsyncMethod_ModelStatisticsINS8_31WithAsyncMethod_RepositoryIndexINS8_35WithAsyncMethod_RepositoryModelLoadINS8_37WithAsyncMethod_RepositoryModelUnloadINS8_40WithAsyncMethod_SystemSharedMemoryStatusINS8_42WithAsyncMethod_SystemSharedMemoryRegisterINS8_44WithAsyncMethod_SystemSharedMemoryUnregisterINS8_38WithAsyncMethod_CudaSharedMemoryStatusINS8_40WithAsyncMethod_CudaSharedMemoryRegisterINS8_42WithAsyncMethod_CudaSharedMemoryUnregisterINS8_28WithAsyncMethod_TraceSettingINS8_27WithAsyncMethod_LogSettingsINS8_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS7_18ModelInferResponseEEENS7_17ModelInferRequestES1G_E5StartEvEUlvE_EEEEE6_M_runEv (this=0x5c9ffe9e4a90) at /usr/include/c++/11/bits/std_thread.h:211
#9 0x000073347eab0253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000073347d87dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x000073347d90ea04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
@JindrichD, @RamonPessoa, We have addressed this issue in the 24.09 release. Please try the latest Tritonserver 24.09 and let us know if the issue persists. PR: https://github.com/triton-inference-server/server/pull/7617
Description Triton receives SIGSEGV during handling the traffic. Last thing that it wrote out was
E0723 11:57:36.328641 1 infer_handler.h:187] ""[INTERNAL] Attempting to access current response when it is not ready
When used debug build the back trace looks like this:
The problem is that on frame 5 the
response_queue
is empty andstate->response_queue_->GetCurrentResponse()
returns null pointer which is then attempted to be written, here: https://github.com/triton-inference-server/server/blob/r24.06/src/grpc/infer_handler.h#L867I was not able to get a handle on how the server works internally yet, but clearly it seems that the code expects a message in a response queue which should be sent away, but the queue is empty. (When i tried to fix a code in a way that write a message only if it was not null then the server did not crash, but client did not receive the response it was waiting for.) Any hints how to debug it better or where to focus would be much appreciated.
Triton Information v24.06
Are you using the Triton container or did you build it yourself? we built ourselves, either by running
python3 compose.py ...
orpython3 build.py --build-type=Debug ...
scripts from the triton repoTo Reproduce We were running one ensemble model (consists of two models internally) which is not decoupled (i.e. is request-response based) and one decoupled model where we send many request and receive one response back at the end of session. From where server crashed it looks like the decoupled model causes a trouble?
I was running inferences from several triton clients (around 10) at the same time. Clients were using grpc streams and were inferencing as ensemble as the decoupled model. Each of them was running different amount of requests, then disconnected and started from beginning. It takes some time, the crash is not immediate, sometimes it takes hours. So looks like some race condition.
Expected behavior Does not crash.