open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.97k stars 2.31k forks source link

OpenTelemetry exporter for Prometheus on Azure Kubernetes cant connect to Prometheus service #31914

Closed abkhan5 closed 6 months ago

abkhan5 commented 6 months ago

Opentelemetry exporter on Azure Kubernetes on trying to connect to Prometheus throws an error

OpenTelemetry exporter on being configured to connect to Prometheus server on the same cluster throws the following error _024-03-22T11:50:22.849Z error exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/prometheus", "error": "not retryable error: Permanent error: error exporting items, request to http://prometheus-server.monitoring.svc.cluster.local:80/v1/metrics responded with HTTP Status Code 404", "droppeditems": 107}

Install Prometheus on Azure Kubernetes on the monitoring namespace using helm chart

helm install prometheus prometheus-community/prometheus --debug -n monitoring

On installing prometheues this is what the services look like

image

Install opentelemetry using helm with a values.yaml file which looks like the code below

helm install opentelemetry-collector open-telemetry/opentelemetry-collector --debug -f values.yaml;

The values.yaml file looks like this

mode: deployment

presets:
  # enables the k8sattributesprocessor and adds it to the traces, metrics, and logs pipelines
  kubernetesAttributes:
    enabled: true
  # enables the kubeletstatsreceiver and adds it to the metrics pipelines
  kubeletMetrics:
    enabled: true
  # Enables the filelogreceiver and adds it to the logs pipelines
  logsCollection:
    enabled: true

config:
  processors:
    resourcedetection:
      detectors: [env, system]
    cumulativetodelta:
    batch:
      send_batch_max_size: 1000
      timeout: 30s
      send_batch_size : 800

    memory_limiter:
      check_interval: 1s
      limit_percentage: 70
      spike_limit_percentage: 30

  receivers:
    prometheus:
      config:  
        scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 10s
          static_configs:
          - targets: ['0.0.0.0:8888']
        - job_name: 'node-exporter'
          scrape_interval: 10s
          static_configs:
          - targets: ['0.0.0.0:9100']

    hostmetrics:
      collection_interval: 30s
      scrapers:
        cpu:
        disk:
        memory:
        load:
          cpu_average: true
    kubeletstats:
        collection_interval: 10s
        auth_type: 'serviceAccount'
        endpoint: '${env:K8S_NODE_NAME}:10250'
        insecure_skip_verify: true
        metric_groups:
            - node
            - pod
            - container

  exporters:
    otlphttp/prometheus:      
      endpoint: "http://prometheus-server.monitoring.svc.cluster.local:80"
      tls:
        insecure: true

    prometheusremotewrite:
      endpoint: http://prometheus-server.monitoring.cluster.local:9090/api/v1/push
      tls:
        insecure: true

    prometheus:
      endpoint: "prometheus-server.monitoring.svc.cluster.local:80"
      const_labels:
        label1: dev2
      send_timestamps: true
      metric_expiration: 180m
      enable_open_metrics: true
      add_metric_suffixes: false      
      resource_to_telemetry_conversion:
        enabled: true

  service:
    pipelines:
      metrics:
        processors: [cumulativetodelta, batch, resourcedetection,memory_limiter]
        receivers:
          - otlp
          - hostmetrics
          - kubeletstats
        exporters:
          - otlphttp/prometheus

I expected apps deployed on AKS to show up in Prometheus . I also expected the logs in the opentelemetry pod to show it successfully able to connect and for metrics to show up in Prometheus

What I see instead are 404 error on trying to connect to prometheus . The errors look like this

error exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/prometheus", "error": "not retryable error: Permanent error: error exporting items, request to http://prometheus-server.monitoring.svc.cluster.local:80/v1/metrics responded with HTTP Status Code 404", "dropped_items": 107}

While trying the prometheusremotewrite expoerter i get the following error

exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric \"system.disk.io\"; invalid temporality and type combination for metric \"system.disk.io_time\"; invalid temporality and type combination for metric \"system.disk.merged\"; invalid temporality and type combination for metric \"system.disk.operation_time\"; invalid temporality and type combination for metric \"system.disk.operations\"; invalid temporality and type combination for metric \"system.disk.weighted_io_time\"; invalid temporality and type combination for metric \"system.cpu.time\"; Permanent error: Permanent error: context deadline exceeded", "errorCauses": [{"error": "Permanent error: invalid temporality and type combination for metric \"system.disk.io\"; invalid temporality and type combination for metric \"system.disk.io_time\"; invalid temporality and type combination for metric \"system.disk.merged\"; invalid temporality and type combination for metric \"system.disk.operation_time\"; invalid temporality and type combination for metric \"system.disk.operations\"; invalid temporality and type combination for metric \"system.disk.weighted_io_time\"; invalid temporality and type combination for metric \"system.cpu.time\""}, {"error": "Permanent error: Permanent error: context deadline exceeded"}], "dropped_items": 107}

github-actions[bot] commented 6 months ago

Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 6 months ago

Pinging code owners for exporter/prometheus: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole commented 6 months ago

Looks like this is actually using the otelhttp exporter?

abkhan5 commented 6 months ago

Looks like this is actually using the otelhttp exporter?

i've tried all three of the exporters mentioned . Each give different error messages The promethues exporter gives the following

no existing monitoring routine is r │ │ 2024/03/23 01:39:11 collector server run finished with error: cannot start pipelines: listen tcp 10.0.117.182:80: bind: cannot assign requested address;

the oltphttp/prometheus exporter gives this error exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlphttp/prometheus", "error": "not retryable error: Permanent error: error exporting items, request to http://prometheus-server.monitoring.svc.cluster.local:80/v1/metrics responded with HTTP Status Code 404", "dropped_items": 107}

and remotewrite gives

exporterhelper/queue_sender.go:97 Exporting failed. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: invalid temporality and type combination for metric "system.disk.io"; invalid temporality and type combination for metric "system.disk.io_time"; invalid temporality and type combination for metric "system.disk.merged"; invalid temporality and type combination for metric "system.disk.operation_time"; invalid temporality and type combination for metric "system.disk.operations"; invalid temporality and type combination for metric "system.disk.weighted_io_time"; invalid temporality and type combination for metric "system.cpu.time"; Permanent error: Permanent error: context deadline exceeded", "errorCauses": [{"error": "Permanent error: invalid temporality and type combination for metric "system.disk.io"; invalid temporality and type combination for metric "system.disk.io_time"; invalid temporality and type combination for metric "system.disk.merged"; invalid temporality and type combination for metric "system.disk.operation_time"; invalid temporality and type combination for metric "system.disk.operations"; invalid temporality and type combination for metric "system.disk.weighted_io_time"; invalid temporality and type combination for metric "system.cpu.time""}, {"error": "Permanent error: Permanent error: context deadline exceeded"}], "dropped_items": 107}

dashpole commented 6 months ago

To send OTLP to prometheus, you need to enable OTLP ingestion on the prometheus server: https://prometheus.io/docs/prometheus/latest/feature_flags/#otlp-receiver

To send any metrics to Prometheus today, you need to make sure they aggregation temporality is Cumulative, not delta. The errors in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31914#issuecomment-2016294872 indicate you are trying to send Delta metrics.

The prometheus exporter exposes a local endpoint on the collector (e.g. localhost:8080) which a prometheus server can scrape. It doesn't really make sense to listen on a remote IP.