open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.26k stars 1.41k forks source link

OTLP Receiver through batch to Exporter ClickHouse creating double the points received #9452

Open JustinMason opened 7 months ago

JustinMason commented 7 months ago

Describe the bug Metric Points are doubling from Receiver to Exporter persistence.

Steps to reproduce I have one instance of the Collector. I have a otlp receiver that is getting a 20k Metric Points Rate. Processor memory_limiter Point Rate is 40k. The Batch Metrics is 40k too. The Exporter Point Rate is 40k. The Exporter is ClickHouse and I can see 2 duplicate records for each point.

What did you expect to see? I expect to see the Processor and Exporter Metric Point Rate to match Receiver.

What did you see instead? I see double to rates, and duplicate metric points in ClickHouse/Exporter

select count(*) as count, MetricName, TimeUnix, Value, `Attributes`, ResourceAttributes 
from otel.otel_metrics_local_sum 
WHERE  
  TimeUnix >  '2024-02-01 19:30:10' -- 
group by  MetricName, TimeUnix, Value, `Attributes`, ResourceAttributes 
having count(*) > 1

What version did you use? v0.93.0

What config did you use?

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: '0.0.0.0:4317'
        tls:
          cert_file: '/opt/certs/tls.crt'
          key_file: '/opt/certs/tls.key'
          ca_file: '/opt/certs/ca.crt'
processors:
  batch:
    send_batch_size: 11000
    timeout: '500ms'
  memory_limiter:
    check_interval: '5s'
    limit_percentage: 90
  resourcedetection:
    detectors:
      - 'gcp'
    timeout: '10s'
exporters:
  clickhouse:
    endpoint: 'tcp://chi-telemetry-prod....:9000/otel,tcp://chi-telemetry-prod.....:9000/otel'
    database: 'otel'
    username: '<user>'
    password: '<pwd>'
    ttl: 0
    logs_table_name: 'otel_logs_no_replica'
    traces_table_name: 'otel_traces_no_replica'
    metrics_table_name: 'otel_metrics_local'
    timeout: '5s'
    sending_queue:
      storage: 'file_storage/otc'
      queue_size: 100000
      num_consumers: 14
    retry_on_failure:
      enabled: true
      initial_interval: '5s'
      max_interval: '120s'
      max_elapsed_time: '0'
extensions:
  health_check: {}
  file_storage/otc:
    directory: '/etc/otel-collector/buffer'
    timeout: '1s'
    compaction:
      on_start: false
      on_rebound: true
      directory: '/etc/otel-collector/buffer/tmp/'
      rebound_needed_threshold_mib: '20000'
      rebound_trigger_threshold_mib: '10000'
      max_transaction_size: 0
    fsync: true
service:
  extensions:
    - 'file_storage/otc'
    - 'health_check'
  pipelines:
    metrics:
      receivers:
        - 'otlp'
      processors:
        - 'memory_limiter'
        - 'batch'
      exporters:
        - 'clickhouse'
  telemetry:
    metrics:
      address: '0.0.0.0:8888'
      level: 'normal'

Environment GKE

Additional context

TylerHelmuth commented 7 months ago

Can you add the debug exporter and look for duplicate datapoints? If you don't see any duplications with the debug exporter I would like to blame the clickhouse exporter and move this to Collector Contrib.