triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.48k forks source link

Triton server receives Signal (11) when tracing is enabled with no sampling (or a small sampling rate) #7795

Open nicomeg-pr opened 3 days ago

nicomeg-pr commented 3 days ago

Description

When starting Triton Server with tracing and with a generic model (e.g., identity_model_fp32 from the Python backend example), the server crashes with signal 11 after handling a few thousand requests at a relatively high QPS (> 100).

The issue appears to be primarily influenced by the QPS rather than the total number of requests sent to the server—the higher the QPS, the sooner the signal 11 crash occurs.

I get the following error message :

Signal (11) received.
14# 0x00007F78C0DED850 in /lib/x86_64-linux-gnu/libc.so.6
13# 0x00007F78C0D5BAC3 in /lib/x86_64-linux-gnu/libc.so.6
12# 0x00007F78C0FCC253 in /lib/x86_64-linux-gnu/libstdc++.so.6
11# 0x00005A51911D67F2 in tritonserver
10# 0x00005A5191336143 in tritonserver
9# 0x00005A51911E9411 in tritonserver
8# 0x00005A51911E7B7D in tritonserver
7# 0x00005A5191855163 in tritonserver
6# 0x00005A519121F25C in tritonserver
5# 0x00005A5191856623 in tritonserver
4# 0x00005A519121E5B0 in tritonserver
3# 0x00005A5191858D2A in tritonserver
2# 0x00005A5191858B84 in tritonserver
1# 0x00007F78C0D09520 in /lib/x86_64-linux-gnu/libc.so.6
0# 0x00005A519117D52D in tritonserver

I receive a lot of warning before signal (11): [Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

I tested with several backends and models : torchscript, python, onnx and observed the same behavior across all of them (onT4and A100gpus).

The issue appears to be related to the --trace-config sampling rate parameter. When the rate is set to 100 or higher, everything works fine. However, when it's set between 1 and 100, the server receives Signal (11) and restarts.

Triton Information

I use triton version : 24.09

I used standard container : nvcr.io/nvidia/tritonserver:24.09-py3

To Reproduce

Use a sample model from the repo e.g: identity_fp32

Deploy to with the following helm chart deployment :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
  namespace: {{ .Release.Namespace }}
spec:
  replicas: 1
  minReadySeconds: 30
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
      annotations:
        ad.datadoghq.com/{{ .Release.Name }}.checks: |
          {
            "openmetrics": {
              "init_config": {},
              "instances": [
                {
                  "openmetrics_endpoint": "http://%%host%%:8002/metrics",
                  "namespace": "dev.backend.tritonserver",
                  "metrics": ["nv_.*"],
                  "tags":["env:dev"]
                }
              ]
            }
          }
    spec:
      serviceAccountName: {{ .Values.serviceAccountName }}
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      containers:
        - name: {{ .Release.Name }}
          image: nvcr.io/nvidia/tritonserver:24.09-py3
          imagePullPolicy: Always
          command:
            - tritonserver
            - --model-repository=gs://path/to/repo
            - --trace-config
            - mode=opentelemetry
            - --trace-config
            - opentelemetry,url=http://datadog-agent-agent.datadog.svc:4318/v1/traces
            - --trace-config
            - opentelemetry,bsp_max_export_batch_size=1
            - --trace-config
            - opentelemetry,resource=service.name=backend.tritonserver
            - --trace-config
            - opentelemetry,resource=deployment.environment=dev
            - --trace-config
            - rate=1
            - --trace-config
            - level=TIMESTAMPS
            - --log-warning=1
            - --log-error=1
          ports:
            - name: http
              containerPort: 8000
            - name: grpc
              containerPort: 8001
            - name: metrics
              containerPort: 8002
          livenessProbe:
            initialDelaySeconds: 60
            failureThreshold: 3
            periodSeconds: 10
            httpGet:
              path: /v2/health/live
              port: http
          readinessProbe:
            initialDelaySeconds: 60
            periodSeconds: 5
            failureThreshold: 3
            httpGet:
              path: /v2/health/ready
              port: http
          startupProbe:
            periodSeconds: 10
            failureThreshold: 30
            httpGet:
              path: /v2/health/ready
              port: http
          resources:
            limits:
              nvidia.com/gpu: 1

Expected behavior

After few thousand requests at a high QPS, server should receive a signal (11) and restart.

rmccorm4 commented 2 days ago

Hi @nicomeg-pr, thanks for raising this.

I receive a lot of warning before signal (11): [Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/b

  1. Was this warning cut off? Is there more to it?
  2. Can you reproduce this using the triton tracing mode instead of opentelemetry mode?
  3. Can you reproduce with both HTTP and GRPC clients, or only one?
  4. Can you share the client script to send the request load?

CC @indrajit96 @oandreeva-nv

nicomeg-pr commented 2 days ago

Here is the complete warning message, sorry it was truncated :

[Warning] File: /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/opentelemetry-cpp/src/opentelemetry-cpp/sdk/src/trace/batch_span_processor.cc:55 BatchSpanProcessor queue is full - dropping span.

All the warnings are the same.