shrutichy91 commented 1 month ago

Component(s)

receiver/prometheus

Describe the issue you're reporting

I have a 3 node k8s cluster. I am using otel as daemonset with the following config:

extensions:

The health_check extension is mandatory for this chart.

# Without the health_check extension the collector will fail the readiness and liveliness probes.
# The health_check extension can be modified, but should never be removed.
health_check: {}
memory_ballast: {}
bearertokenauth:
  token:  "XXXXXX"

processors:

batch:
    timeout: 1s
    send_batch_size: 1000
    send_batch_max_size: 2000

# If set to null, will be overridden with values based on k8s resource limits

receivers:

prometheus:
  config:
    scrape_configs:
      - job_name: kube-scheduler-nodeport
        honor_labels: true
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - kube-system
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        scheme: https
        tls_config:
          insecure_skip_verify: true
        relabel_configs:
          # Keep pods with the specified labels
          - source_labels:
              [
                __meta_kubernetes_pod_label_component,
                __meta_kubernetes_pod_label_tier,
              ]
            action: keep
            regex: kube-scheduler;control-plane
          - source_labels: [__meta_kubernetes_pod_ip]
            action: replace
            target_label: __address__
            regex: (.*)
            replacement: $$1:10259

otlp:
  protocols:
    grpc:
      endpoint: ${env:MY_POD_IP}:4317
    http:
      endpoint: ${env:MY_POD_IP}:4318

exporters: logging: {} prometheusremotewrite: endpoint: "xxxxxxx" resource_to_telemetry_conversion: enabled: true tls: insecure: true auth: authenticator: bearertokenauth
service: telemetry: metrics: address: ${env:MY_POD_IP}:8888 logs: level: debug extensions:

health_check
bearertokenauth pipelines:

metrics: exporters:
- logging
- prometheusremotewrite processors:
- batch receivers:
- prometheus

I get the below error.

2024-10-23T12:45:56.402Z debug scrape/scrape.go:1331 Scrape failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "kube-scheduler", "target": "https://100.xx.xx.xx:10259/metrics", "error": "Get "https://100.xx.xx.xx:10259/metrics\": dial tcp 100.xx.xx.xx:10259: connect: connection refused"}

I have the kube controller as three pods running on one node each on a 3 node cluster in the kube-system namespace. DO I need a k8s service of type nodeport to get this to work?

I tried to login to the node, and run the curl -kvv https://100.xx.xx.xx:10259/metrics, I get connection refused, but it does work with curl -kvv https://localhost:10259/metrics

github-actions[bot] commented 1 month ago

Pinging code owners:

receiver/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole commented 1 month ago

Is the metrics port exposed on the scheduler pod? You shouldn't need a service if the scheduler is running in-cluster.

Juliaj commented 3 weeks ago

In our environment, this is reproducible with build 0.111.0 and not reproducible with 0.110.0.

open-telemetry / opentelemetry-collector-contrib

connection refused while scraping for kube scheduler metrics #35959

Component(s)

Describe the issue you're reporting

The health_check extension is mandatory for this chart.