Description
During some soaks tests with the triton server we see a segfault (sig 11), when running in debug mode we get this stack trace at the bottom of this ticket.
We're using the PyTorch backend with implicit state management and the oldest strategy sequence batching, given it looks like a double free of some state, could it be related to: https://github.com/triton-inference-server/server/issues/7117 ? Which had a similar issue.
Triton Information
What version of Triton are you using?
NGC version 24.06
Are you using the Triton container or did you build it yourself?
built ourselves
To Reproduce
Steps to reproduce the behavior.
Not quite sure, we have to run our soaks for 5ish hours and we do seem to cancel a few of the connections in that time
E0822 03:42:21.437627161 65 timer_generic.cc:133] ** Duplicate timer (0x7fa74bc34c08) being added. Closure: (0x7fa74bc34c50), created at: (/tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/src/cpp/common/alarm.cc:62), scheduled at: ((null):0) **
Signal (6) received.
0# 0x00005587E73EC261 in /opt/tritonserver/bin/tritonserver
1# 0x00007FB341DCB520 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /usr/lib/x86_64-linux-gnu/libc.so.6
3# raise in /usr/lib/x86_64-linux-gnu/libc.so.6
4# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
5# 0x00005587E78293D4 in /opt/tritonserver/bin/tritonserver
6# 0x00005587E782A01A in /opt/tritonserver/bin/tritonserver
7# 0x00005587E7829018 in /opt/tritonserver/bin/tritonserver
8# 0x00005587E7685417 in /opt/tritonserver/bin/tritonserver
9# 0x00005587E768430A in /opt/tritonserver/bin/tritonserver
10# 0x00005587E7473826 in /opt/tritonserver/bin/tritonserver
11# 0x00005587E74A04EC in /opt/tritonserver/bin/tritonserver
12# 0x00005587E749C66F in /opt/tritonserver/bin/tritonserver
13# 0x00007FB34276E607 in /opt/tritonserver/bin/../lib/libtritonserver.so
14# 0x00007FB34276E6FE in /opt/tritonserver/bin/../lib/libtritonserver.so
15# 0x00007FB3427554D8 in /opt/tritonserver/bin/../lib/libtritonserver.so
16# 0x00007FB342711E61 in /opt/tritonserver/bin/../lib/libtritonserver.so
17# 0x00007FB342713101 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007FB34270F5B7 in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007FB34270DCF4 in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007FB34276E607 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# TRITONBACKEND_ResponseSend in /opt/tritonserver/bin/../lib/libtritonserver.so
22# 0x00007FB2A851D271 in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
23# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
24# 0x00007FB3426CDF85 in /opt/tritonserver/bin/../lib/libtritonserver.so
25# 0x00007FB3426CD33A in /opt/tritonserver/bin/../lib/libtritonserver.so
26# 0x00007FB34282315D in /opt/tritonserver/bin/../lib/libtritonserver.so
27# 0x00007FB3426CEE51 in /opt/tritonserver/bin/../lib/libtritonserver.so
28# 0x00007FB3426CE185 in /opt/tritonserver/bin/../lib/libtritonserver.so
29# 0x00007FB3426D044E in /opt/tritonserver/bin/../lib/libtritonserver.so
30# 0x00007FB3426D0411 in /opt/tritonserver/bin/../lib/libtritonserver.so
31# 0x00007FB3426D03BE in /opt/tritonserver/bin/../lib/libtritonserver.so
32# 0x00007FB3426D0392 in /opt/tritonserver/bin/../lib/libtritonserver.so
33# 0x00007FB3426D0376 in /opt/tritonserver/bin/../lib/libtritonserver.so
34# 0x00007FB34208E253 in /opt/tritonserver/libs/./libstdc++.so.6
35# 0x00007FB341E1DAC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
36# 0x00007FB341EAF850 in /usr/lib/x86_64-linux-gnu/libc.so.6
Description During some soaks tests with the triton server we see a segfault (sig 11), when running in debug mode we get this stack trace at the bottom of this ticket.
We're using the PyTorch backend with implicit state management and the oldest strategy sequence batching, given it looks like a double free of some state, could it be related to: https://github.com/triton-inference-server/server/issues/7117 ? Which had a similar issue.
Triton Information What version of Triton are you using?
NGC version 24.06
Are you using the Triton container or did you build it yourself?
built ourselves
To Reproduce Steps to reproduce the behavior.
Not quite sure, we have to run our soaks for 5ish hours and we do seem to cancel a few of the connections in that time