open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.73k stars 2.16k forks source link

prometheusremotewrite exporter with histogram is causing metrics export failure due to high memory (90%) #30675

Open bhupeshpadiyar opened 5 months ago

bhupeshpadiyar commented 5 months ago

Component(s)

exporter/prometheusremotewrite

What happened?

Description

Collector memory and CPU usages are spiking while exporting histogram metrics with prometheusremotewrite exporter and causing metrics export failure with following errors logs.

2024-01-17T10:33:50.902Z info memorylimiterprocessor@v0.92.0/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn memorylimiterprocessor@v0.92.0/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}

Note - This issue is happening only with histogram metrics export only. This exporter works fine with counter and gauge type metrics

Steps to Reproduce

Expected Result

All type metrics (counter, gauge, histogram) should export seamlessly, without error.

Actual Result

Causing high memory usages error and exporting failed with histogram metrics.

2024-01-17T10:33:50.902Z info memorylimiterprocessor@v0.92.0/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn memorylimiterprocessor@v0.92.0/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}
CollectorMemoryCPU ExporterQUEUE BatchMetrics MetricsPointRate

Collector version

0.92.0 (Confirmed with older collector versions as well)

Environment information

Environment

OS: Linux/ARM64 - AWS ECS (Fargate) Cluster

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
            - targets: [ '0.0.0.0:8888' ]
exporters:
  logging:
    verbosity: "basic"
    sampling_initial: 5
  prometheusremotewrite:
    endpoint: "http://<victoria-metrics-instance>:8428/prometheus/api/v1/write"
    tls:
      insecure: true
processors:
  batch:
    timeout:
    send_batch_size:
    send_batch_max_size:
  memory_limiter:
    check_interval: 200ms
    limit_mib: 20000
    spike_limit_mib: 4000
extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1888
    block_profile_fraction: 3
    mutex_profile_fraction: 5
  zpages:
    endpoint: 0.0.0.0:55679
service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch]
      exporters: [logging, prometheusremotewrite]

Log output

2024-01-17T10:33:50.902Z info memorylimiterprocessor@v0.92.0/memorylimiter.go:287 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16085}

2024-01-17T10:50:08.919Z error scrape/scrape.go:1351 Scrape commit failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "otel-collector", "target": "http://0.0.0.0:8888/metrics", "error": "data refused due to high memory usage"}

2024-01-17T15:07:12.464Z warn memorylimiterprocessor@v0.92.0/memorylimiter.go:294 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "metrics", "cur_mem_mib": 16081}


### Additional context

This issue is happening only with histogram metrics export only and the exporter works fine with counter and gauge type metrics
github-actions[bot] commented 5 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 commented 5 months ago

Note: This is possibly a duplicate of #24405

bhupeshpadiyar commented 5 months ago

Hi @crobert-1 ,

Just to inform you, we are facing this issue with the histogram metrics but the linked issue seems related to Exponential histogram metrics.

In our case, we are getting this error in the case of exporting Histogram type metrics.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.