jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.24k stars 2.41k forks source link

[Feature/SPM]: Support spanmetrics connector #4345

Closed albertteoh closed 1 year ago

albertteoh commented 1 year ago

Requirement

As a jaeger operator, I want to use the newly introduced spanmetrics connector for the following reasons:

Problem

The known breaking issues include:

Proposal

Make the metric names configurable.

Perhaps introduce an spm parameter namespace where metric names can be configured. e.g. --spm.calls-metric-name and --spm.latency-metric-name.

This would also require an update to the example provided in docker-compose/monitor.

Other suggestions welcome.

Open questions

No response

warning-explosive commented 1 year ago

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  metricstransform/insert:
    transforms:
      - include: calls
        match_type: strict
        action: update
        new_name: calls_total
        operations:
        - action: update_label
          label: span.name
          new_label: operation
      - include: duration
        match_type: strict
        action: update
        new_name: latency
        operations:
          - action: update_label
            label: span.name
            new_label: operation

exporters:
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  prometheus:
    endpoint: "otel-collector:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      processors: [metricstransform/insert]
      exporters: [prometheus]
Owenxh commented 1 year ago

I want to use the spanmetrics connector too.

utezduyar commented 1 year ago

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  metricstransform/insert:
    transforms:
      - include: calls
        match_type: strict
        action: update
        new_name: calls_total
        operations:
        - action: update_label
          label: span.name
          new_label: operation
      - include: duration
        match_type: strict
        action: update
        new_name: latency
        operations:
          - action: update_label
            label: span.name
            new_label: operation

exporters:
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  prometheus:
    endpoint: "otel-collector:9464"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      processors: [metricstransform/insert]
      exporters: [prometheus]

This worked well for me however something is not entirely right. I have a demo application with 5 services talking to each other. All 4 of them has data under Monitor but one of them does not have data. One application out of 4 is very similar to this one that is not working which puzzled me.

Maybe something about the name of the span? Any tips on how to debug it?

warning-explosive commented 1 year ago

I have a workaround based on metricstransform processor. Here an example of otel-collector-config.yaml:


receivers:

  otlp:

    protocols:

      grpc:

      http:

processors:

  batch:

  metricstransform/insert:

    transforms:

      - include: calls

        match_type: strict

        action: update

        new_name: calls_total

        operations:

        - action: update_label

          label: span.name

          new_label: operation

      - include: duration

        match_type: strict

        action: update

        new_name: latency

        operations:

          - action: update_label

            label: span.name

            new_label: operation

exporters:

  otlp:

    endpoint: "jaeger:4317"

    tls:

      insecure: true

  prometheus:

    endpoint: "otel-collector:9464"

    resource_to_telemetry_conversion:

      enabled: true

    enable_open_metrics: true

connectors:

  spanmetrics:

    histogram:

      explicit:

        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]

    dimensions:

      - name: http.method

        default: GET

      - name: http.status_code

    dimensions_cache_size: 1000

    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"

service:

  pipelines:

    traces:

      receivers: [otlp]

      processors: [batch]

      exporters: [otlp, spanmetrics]

    metrics:

      receivers: [otlp, spanmetrics]

      processors: [metricstransform/insert]

      exporters: [prometheus]

This worked well for me however something is not entirely right. I have a demo application with 5 services talking to each other. All 4 of them has data under Monitor but one of them does not have data. One application out of 4 is very similar to this one that is not working which puzzled me.

Maybe something about the name of the span? Any tips on how to debug it?

Several possible issues and solutions that comes to my mind:

  1. Spanmetrics namespaces - check if there are any
  2. Rejected/lost/dropped spans - check otel-collector own metrics, maybe you have some configuration or connectivity issues
  3. Check your app exporter - substitute otel exporter with console exporter so as to be sure that you have expected spans
  4. See debug logs - https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md
utezduyar commented 1 year ago

Thanks! The problem was due to the application missing http semantic conventions. Otherwise your workaround works like a charm.