Open nicomeg-pr opened 3 days ago
Hi @nicomeg-pr, thanks for raising this.
I receive a lot of warning before signal (11): [Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b
triton
tracing mode instead of opentelemetry
mode?CC @indrajit96 @oandreeva-nv
Here is the complete warning message, sorry it was truncated :
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/batch_span_processor.cc:55 BatchSpanProcessor queue is full - dropping span.
All the warnings are the same.
Description
When starting Triton Server with tracing and with a generic model (e.g.,
identity_model_fp32
from the Python backend example), the server crashes with signal 11 after handling a few thousand requests at a relatively high QPS (> 100).The issue appears to be primarily influenced by the QPS rather than the total number of requests sent to the server—the higher the QPS, the sooner the signal 11 crash occurs.
I get the following error message :
I receive a lot of warning before signal (11):
[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b
I tested with several backends and models :
torchscript
,python
,onnx
and observed the same behavior across all of them (onT4
andA100
gpus).The issue appears to be related to the
--trace-config
sampling rate parameter. When the rate is set to 100 or higher, everything works fine. However, when it's set between 1 and 100, the server receives Signal (11) and restarts.Triton Information
I use triton version :
24.09
I used standard container :
nvcr.io/nvidia/tritonserver:24.09-py3
To Reproduce
Use a sample model from the repo e.g: identity_fp32
Deploy to with the following helm chart deployment :
Expected behavior
After few thousand requests at a high QPS, server should receive a signal (11) and restart.