open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

Unable to see exemplar data from span metrics processor when exporter is set to prometheus #17496

Closed vaish1707 closed 1 year ago

vaish1707 commented 1 year ago

Component(s)

exporter/prometheus, processor/spanmetrics

What happened?

Hi Team, I'm trying to generate metrics from span using span metrics processor which I'm able to successfully generate. The exporter is set to prometheus for the metrics generated out of span metrics processor.

I'm using helm charts to deploy opentelemetry collector in kubernetes and following is my configuration

mode: "deployment"
replicaCount: 1
nameOverride: otel-collector
fullnameOverride: otel-collector

# Base collector configuration.
config:
  exporters:
    otlp:
      endpoint: otel-collector-grpc:4317
      tls:
        insecure: true
    prometheus:
      endpoint: "0.0.0.0:8889"
      metric_expiration: 1440m
      enable_open_metrics: true
  extensions:
    health_check: {}
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679
  processors:
    memory_limiter:
      check_interval: 1s
      limit_mib: 4000
      spike_limit_mib: 800
    batch: {}
    tail_sampling:
      policies:
        - name: drop_noisy_traces_url
          type: string_attribute
          string_attribute:
            key: http.target
            values:
              - \/health
              - \/ping
            enabled_regex_matching: true
            invert_match: true
    spanmetrics:
      metrics_exporter: prometheus
      dimensions:
        - name: http.method
        - name: http.status_code
        - name: http.target
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
  receivers:
    jaeger: null
    prometheus: null
    zipkin: null
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318
    otlp/spanmetrics:
      protocols:
        grpc:
          endpoint: 0.0.0.0:12346
  service:
    extensions:
      - pprof
      - zpages
      - health_check
    pipelines:
      metrics:
        exporters:
          - prometheus
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp/spanmetrics
      traces:
        exporters:
          - otlp
        processors:
          - memory_limiter
          - batch
          - tail_sampling
          - spanmetrics
        receivers:
          - otlp

# Configuration for ports
ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8889
    servicePort: 8889
    protocol: TCP
  healthcheck:
    enabled: true
    containerPort: 13133
    servicePort: 13133
    protocol: TCP
  zpages:
    enabled: true
    containerPort: 55679
    servicePort: 55679
    protocol: TCP
  pprof:
    enabled: true
    containerPort: 1888
    servicePort: 1888
    protocol: TCP
# Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment.
resources:
  limits:
    cpu: 256m
    memory: 512Mi

service:
  type: NodePort
  annotations:
    alb.ingress.kubernetes.io/healthcheck-path: /

Following is an example metric which I see in locahost:8889/metrics and it doesn't have any exemplar data. Do I need to change any configuration in prometheus exporter or span metrics processor for me to see the exemplars data in prometheus exporter?

    latency_bucket{http_method="GET",http_status_code="200",http_target="<http_target>",operation="<operation name>",service_name="<operation_name>",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="15000"} 1

Can someone please help me?

Collector version

0.67.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

mode: "deployment"
replicaCount: 1
nameOverride: otel-collector
fullnameOverride: otel-collector

# Base collector configuration.
config:
  exporters:
    otlp:
      endpoint: otel-collector-grpc:4317
      tls:
        insecure: true
    prometheus:
      endpoint: "0.0.0.0:8889"
      metric_expiration: 1440m
      enable_open_metrics: true
  extensions:
    health_check: {}
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679
  processors:
    memory_limiter:
      check_interval: 1s
      limit_mib: 4000
      spike_limit_mib: 800
    batch: {}
    tail_sampling:
      policies:
        - name: drop_noisy_traces_url
          type: string_attribute
          string_attribute:
            key: http.target
            values:
              - \/health
              - \/ping
            enabled_regex_matching: true
            invert_match: true
    spanmetrics:
      metrics_exporter: prometheus
      dimensions:
        - name: http.method
        - name: http.status_code
        - name: http.target
      dimensions_cache_size: 1000
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
  receivers:
    jaeger: null
    prometheus: null
    zipkin: null
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318
    otlp/spanmetrics:
      protocols:
        grpc:
          endpoint: 0.0.0.0:12346
  service:
    extensions:
      - pprof
      - zpages
      - health_check
    pipelines:
      metrics:
        exporters:
          - prometheus
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp/spanmetrics
      traces:
        exporters:
          - otlp
        processors:
          - memory_limiter
          - batch
          - tail_sampling
          - spanmetrics
        receivers:
          - otlp

# Configuration for ports
ports:
  otlp:
    enabled: true
    containerPort: 4317
    servicePort: 4317
    hostPort: 4317
    protocol: TCP
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    hostPort: 4318
    protocol: TCP
  jaeger-thrift:
    enabled: true
    containerPort: 14268
    servicePort: 14268
    hostPort: 14268
    protocol: TCP
  jaeger-grpc:
    enabled: true
    containerPort: 14250
    servicePort: 14250
    hostPort: 14250
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8889
    servicePort: 8889
    protocol: TCP
  healthcheck:
    enabled: true
    containerPort: 13133
    servicePort: 13133
    protocol: TCP
  zpages:
    enabled: true
    containerPort: 55679
    servicePort: 55679
    protocol: TCP
  pprof:
    enabled: true
    containerPort: 1888
    servicePort: 1888
    protocol: TCP
# Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment.
resources:
  limits:
    cpu: 256m
    memory: 512Mi

service:
  type: NodePort
  annotations:
    alb.ingress.kubernetes.io/healthcheck-path: /

Log output

No response

Additional context

No response

Frapschen commented 1 year ago

Thank for you reporting this bug, I write a ut to test the spanmetric side, base on the result, I think the bug is coming from exporter/prometheus.

kovrus commented 1 year ago

@vaish1707 The exporter/prometheus side also seems fine. I think you have to query the exporter as follows to see exemplars:

curl -H 'Accept: application/openmetrics-text' 0.0.0.0:8889/metrics

enable_open_metrics must be set to true in the Prometheus exporter as well.

kovrus commented 1 year ago

@vaish1707 have you tried that?

vaish1707 commented 1 year ago

@kovrus , I'm seeing the exemplar data in the latency_bucket metric if I use the above curl posted. If I want to scrape the metrics with this exemplar data from another prometheus instance what should be done? Btw from this another prometheus instance I'm sending the metrics to Amazon managed prometheus and trying to get something out of this in grafana.

The flow goes like this.. OTEL collector prometheus(prometheus 1) -----> prometheus instance scraping otel collector RED metrics(prometheus-2) ---> Amazon managed prometheus ---> grafana.. In this case how do I send the exemplar data successfully from otel collector prometheus to grafana? Should we enable this flag https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage in prometheus-2 config?

kovrus commented 1 year ago

Should we enable this flag https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage in prometheus-2 config?

yes, that should be enabled.

vaish1707 commented 1 year ago

Should we enable this flag https://prometheus.io/docs/prometheus/latest/feature_flags/#exemplars-storage in prometheus-2 config?

yes, that should be enabled.

From prometheus-2 I'm sending the metrics to Amazon Managed Prometheus with remote_write configuration (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write)enabled with send_exemplars to true. In this case I'm seeing the exemplars data in prometheus-2 but not in Amazon Managed Prometheus. Because of this I'm unable to visualise exemplars data in grafana keeping Amazon Managed Prometheus as datasource

github-actions[bot] commented 1 year ago

Pinging code owners for exporter/prometheus: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for processor/spanmetrics: @albertteoh. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.