open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.32k stars 1.43k forks source link

[telemetry] Multiple periodic readers increment the counter abnormaly #11327

Open tosuke opened 2 days ago

tosuke commented 2 days ago

Describe the bug When multiple (periodic) readers are present in service.telemetry.metrics.readers, metrics such as otelcol_process_runtime_total_alloc_bytes and otelcol_process_cpu_seconds increase abnormally.

Steps to reproduce

What did you expect to see?

image

What did you see instead?

image

What version did you use?

v0.110.0

What config did you use?

exporters:
  otlphttp/prometheus:
    metrics_endpoint: http://prometheus:9090/api/v1/otlp/v1/metrics

connectors:
  forward:

receivers:
  hostmetrics:
    collection_interval: 15s
    root_path: /
    scrapers:
      cpu:
      disk:
      filesystem:
      load:
      memory:
        metrics:
          system.linux.memory.available:
            enabled: true
          system.memory.limit:
            enabled: true
      network:
      paging:
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 64
  batch:
    timeout: 1s
  resource:
    attributes:
      - key: service.namespace
        value: net
        action: insert
  resource/host:
    attributes:
      - key: host.name
        value: ${file:/etc/hostname}
        action: insert
      - key: host.id
        value: ${file:/etc/machine-id}
        action: insert
  resource/hostmetrics:
    attributes:
      - key: service.name
        action: insert
        value: host
      - key: service.instance.id
        action: insert
        from_attribute: host.id
service:
  telemetry:
    resource:
      service.namespace: net
      host.name: ${file:/etc/hostname}
      host.id: ${file:/etc/machine-id}
    metrics:
      readers:
        - periodic:
            interval: 15000 # 15s
            exporter:
              otlp:
                protocol: http/protobuf
                endpoint: http://prometheus:9090/api/v1/otlp/v1/metrics
        - periodic:
            interval: 60000 # 1m
            exporter:
              otlp:
                protocol: grpc/protobuf
                endpoint: https://injest-otlp.example.com:4317
    logs:
      encoding: console
  pipelines:
    metrics/host:
      receivers: [hostmetrics]
      processors:
        - memory_limiter
        - batch
        - resource
        - resource/host
        - resource/hostmetrics
      exporters: [forward]
    metrics/prometheus_write:
      receivers: [forward]
      exporters: [otlphttp/prometheus]

Environment

OS: Debian 12 Compiler: go 1.23.0

Additional context

bogdandrutu commented 1 day ago

This is a otel-go bug. cc @open-telemetry/go-approvers @MrAlias

Random idea: I can you plot rate(otelcol_process_runtime_total_alloc_bytes[5m]) since you export every 1m you may see some spikes if you plot for 1m.

tosuke commented 1 day ago

Here is the plot for rate(otelcol_process_runtime_alloc_bytes_total[5m]): image

dashpole commented 1 day ago

Tracking bug in OTel-go: https://github.com/open-telemetry/opentelemetry-go/issues/5866