open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

[prometheusremotewrite] Partial collector metrics exported after upgrade from v0.84.0 #33838

Closed mhawley1230 closed 1 month ago

mhawley1230 commented 3 months ago

Component(s)

exporter/prometheusremotewrite, receiver/prometheus

What happened?

Description

Hello, since upgrade from v0.84.0 to v0.102.0, metrics related to the receiver, processor, and exported have ceased exporting.
Internal collector metrics available through port-forwarding to 8888.

Partial list:

# HELP otelcol_exporter_prometheusremotewrite_translated_time_series Number of Prometheus time series that were translated from OTel metrics
# TYPE otelcol_exporter_prometheusremotewrite_translated_time_series counter
otelcol_exporter_prometheusremotewrite_translated_time_series{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 4.980208e+06
# HELP otelcol_exporter_queue_capacity Fixed capacity of the retry queue (in batches)
# TYPE otelcol_exporter_queue_capacity gauge
otelcol_exporter_queue_capacity{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 1000
# HELP otelcol_exporter_queue_size Current size of the retry queue (in batches)
# TYPE otelcol_exporter_queue_size gauge
otelcol_exporter_queue_size{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_log_records Number of log records in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_log_records counter
otelcol_exporter_send_failed_log_records{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_send_failed_spans Number of spans in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_spans counter
otelcol_exporter_send_failed_spans{exporter="otlp/tempo",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
otelcol_exporter_send_failed_spans{exporter="otlphttp/honeycomb",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 0
# HELP otelcol_exporter_sent_log_records Number of log record successfully sent to destination.
# TYPE otelcol_exporter_sent_log_records counter
otelcol_exporter_sent_log_records{exporter="loki",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 7834
# HELP otelcol_exporter_sent_metric_points Number of metric points successfully sent to destination.
# TYPE otelcol_exporter_sent_metric_points counter
otelcol_exporter_sent_metric_points{exporter="prometheusremotewrite",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 1.603099e+06
# HELP otelcol_exporter_sent_spans Number of spans successfully sent to destination.
# TYPE otelcol_exporter_sent_spans counter
otelcol_exporter_sent_spans{exporter="otlp/tempo",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 205
otelcol_exporter_sent_spans{exporter="otlphttp/honeycomb",service_instance_id="9826fcc4-28c9-4a9f-a7da-681630ba9df0",service_name="otelcol-contrib",service_version="0.102.1"} 205
image image

Unsure if this is due to a change and guidance would be appreciated.

Steps to Reproduce

  1. Deploy v0.84.0 docker image using ConfigMap shared to EKS 1.28
  2. Verify metrics are exported to cortex/mimir
  3. Upgrade image to v0.102.1
  4. Restart deployment

Expected Result

Metrics around telemetry data scraped and sent though pipeline.

Actual Result

Only clusters on opentelemetry-collector v0.84.0 have all expected metrics exported sucessfully to mimir via prometheus.

Collector version

v0.102.1

Environment information

Environment

EKS K8s 1.28 Otel helm chart v0.93.3

OpenTelemetry Collector configuration

exporters:
  prometheusremotewrite:
    endpoint: https://${CORTEX_ENDPOINT/api/v1/push
    headers:
      X-Scope-OrgID: ${K8S_CLUSTER_TENANT}
    timeout: 30s
processors:
  batch:
    send_batch_max_size: 1500
    send_batch_size: 1000
    timeout: 5s
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
  prometheus:
    config:
      global:
        scrape_interval: 60s
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          regex: true
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
        - action: replace
          regex: (.+)
          source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $$1:$$2
          source_labels:
          - __address__
          - __meta_kubernetes_pod_annotation_prometheus_io_port
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - action: replace
          source_labels:
          - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          source_labels:
          - __meta_kubernetes_pod_name
          target_label: pod
service:
  extensions:
  - health_check
  pipelines:
    metrics:
      exporters:
      - prometheusremotewrite
      processors:
      - memory_limiter
      - resourcedetection/eks
      - resourcedetection/ec2
      - attributes/common
      - attributes/replica
      - batch
      - metricstransform
      receivers:
      - otlp
      - prometheus

Log output

2024-07-02T00:29:19.051Z    info    prometheusreceiver@v0.102.0/metrics_receiver.go:344 Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-07-02T00:30:29.252Z    debug   scrape/scrape.go:1331   Scrape failed   {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "kubernetes-pods", "target": "http://{NODE_IP}:2020/api/v2/metrics/prometheus", "error": "Get \"http://{NODE_IP}:2020/api/v2/metrics/prometheus\": context deadline exceeded"}

Additional context

No response

github-actions[bot] commented 3 months ago

Pinging code owners:

dashpole commented 2 months ago

You need to scrape your self-observability metrics in the collector config if you want them. See https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/919e5a2d2d073b15098923fe1e2061309dde2fd8/receiver/prometheusreceiver/README.md?plain=1#L101