open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.87k stars 2.24k forks source link

[connector/spanmetrics] - unable to export the metrics using prometheusremotewrite when batch processor enabled #32042

Open ramanjaneyagupta opened 5 months ago

ramanjaneyagupta commented 5 months ago

Component(s)

connector/spanmetrics

What happened?

Description

Unable to export span metrics using prometheusremotewrite when batch processor enabled. Agetns -> Gateway(OtelCollectors) -> Storage. Gateway contains multiple servers which calculates the spanmetrics and writes to the Prometheus using PRW.

Steps to Reproduce

enable spanmetrics with prometheus remotewrite

Expected Result

Prometheus Remote Write should able to export the metrics when batch processor is enabled

Actual Result

It is giving "Permanent Error"; "Duplicate Sample For timestamp"; " Permanent error remote write returned status code of 400 bad request""

Collector version

v0.96.0

Environment information

Environment

OS: Linux (RHEL)

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
      grpc:

exporters:
  prometheusremotewrite:
    endpoint: http://<endpoint>
     target_info:
       enabled: true
    resource_to_telemetry_conversion:
      enabled: true 

connectors:
  spanmetrics:
   dimensions:
      - name: http.method
      - name: http.status_code
      - name: k8s.namespace.name
    exemplars:
      enabled: true
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"    
    metrics_flush_interval: 15s
    resource_metrics_key_attributes:
      - service.name
      - telemetry.sdk.language
      - telemetry.sdk.name
processors:
  resourcedetection/system:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection/system, batch]
      exporters: [spanmetrics,tracebackend]
    metrics:
      receivers: [spanmetrics]
      processors: [resourcedetection/system, batch]
      exporters: [prometheusremotewrite]

Log output

" Permanent error remote write returned status code of 400 bad request" "err"=nil duplicate sample for timestamp"

Additional context

No response

github-actions[bot] commented 5 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ankitpatel96 commented 4 months ago

I would guess that the problem here is that your multiple collectors running in gateway mode are submitting the same samples to prometheus. between the default list of

service.name
span.name
span.kind
status.code

and your dimensions list of

http.method
http.status_code
k8s.namespace.name

I would guess that these do not uniquely identify series. Is each collector receiving traces from the same machines? If not the exact same machines, would each collector be receiving traces from containers that run the same service in the same k8s namespace?

I would also guess that this is also the problem in 32043

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Frapschen commented 3 weeks ago

Hi, @ramanjaneyagupta Is the problem still exist? Have you tried to add a unique identity to your metrics?