open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.09k stars 2.38k forks source link

Error: Duplicate span IDs; skipping clock skew adjustment #9102

Closed knayakar closed 2 years ago

knayakar commented 2 years ago

Hi Team,

While making use of routingprocessor, I am facing an issue where i observe the traces from otel-sampler are being routed to all the backend collectors. The routing rule i have set is based on service.name , can you let me know if I am missing something here? Updating configuration in thread.

Steps to reproduce: routing rule i used in my configmap:

    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      batch:
        timeout: 10s
      memory_limiter:
        # 80% of maximum memory up to 2G
        limit_mib: 1500
        # 25% of limit up to 2G
        spike_limit_mib: 512
        check_interval: 5s
      tail_sampling:
        policies:
          [
              {
              name: latency-limit,
              type: latency,
              latency: {threshold_ms: 5000}
              },
              {
              name: http_status_code,
              type: numeric_attribute,
              numeric_attribute: {key: http.status_code, min_value: 400, max_value: 600}
              },
              {
                name: and-policy-1,
                type: and,
                and: {
                  and_sub_policy: 
                  [
                    {
                      name: test-and-policy-1,
                      type: string_attribute,
                      string_attribute: { key: service.name, values: [ appx] }
                    },
                    {
                      name: test-policy-2,
                      type: probabilistic,
                      probabilistic: {sampling_percentage: 10}
                    }
                  ]
                }
              },
              {
                name: and-policy-2,
                type: and,
                and: {
                  and_sub_policy: 
                  [
                    {
                      name: test-and-policy-3,
                      type: string_attribute,
                      string_attribute: { key: service.name, values: [ appy] }
                    },
                    {
                      name: test-and-policy-4,
                      type: probabilistic,
                      probabilistic: { sampling_percentage: 10}
                    }
                  ]
                }
              }
          ]
   routing:
        from_attribute: service.name
        attribute_source: resource
        default_exporters:
        - jaeger/other
        table:
        - value: appx
          exporters: [jaeger/appx]
        - value: appy
          exporters: [jaeger/appy]
    exporters:
      jaeger/other:
        endpoint: <jaeger-other-endpoint> 
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 20
          queue_size: 10000
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        timeout: 5s
      jaeger/appx:
        endpoint: <jaeger-appx-endpoint>
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 20
          queue_size: 10000
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        timeout: 5s
      jaeger/appy:
        endpoint: <jaeger-appy-endpoint>
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 20
          queue_size: 10000
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        timeout: 5s
    service:
      extensions: [health_check, memory_ballast]
      pipelines:
        traces/other:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, routing]
          exporters: [jaeger/other, jaeger/appx, jaeger/appy]
        traces/appx:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, routing]
          exporters: [jaeger/other, jaeger/appx, jaeger/appy]
        traces/appy:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, routing]
          exporters: [jaeger/other, jaeger/appx, jaeger/appy]

What did you expect to see? Traces to be sent to only specified jaeger backend as per routing policy.

What did you see instead? Traces are being sent to all backend and UI observing a warning message "duplicate span IDs; skipping clock skew adjustment" under duplicate spans.

What version did you use? Jaeger Version: v1.31.0 Jaeger UI v1.20.1

knayakar commented 2 years ago

@jpkrohling can you please check above?

berylshow commented 2 years ago

I use container JMeter for performance test. In k8s, I use otel collector to send it to Jaeger and also find this problem: "invalid parent span IDS = 716ef42737eac829; skipping clock skew adjustment"

knayakar commented 2 years ago

The issue for me turned out to be with the sampler routing configuration, I changed my pipeline as just below, instead of three pipelines which fixed the issue:

    service:
      extensions: [health_check, memory_ballast]
      pipelines:
        traces/other:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, routing]
          exporters: [jaeger/other, jaeger/appx, jaeger/appy]
jpkrohling commented 2 years ago

Given your last comment, I'm closing this, but feel free to reopen if it there's something to be done here.