open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.2k stars 436 forks source link

Target allocator is ignoreing serviceMonitorSelector settings #3383

Open tomk-gloat opened 2 days ago

tomk-gloat commented 2 days ago

Component(s)

target allocator

Describe the issue you're reporting

I have the target allocator set up with the following config:

    allocation_strategy: per-node
    filter_strategy: relabel-config
    collector_selector:
      matchlabels:
        app.kubernetes.io/name: opentelemetry-agent
        app.kubernetes.io/instance: opentelemetry
    prometheus_cr:
      scrape_interval: 30s
      pod_monitor_selector:
          EnableOtelCollector: yes
      service_monitor_selector:
          EnableOtelCollector: yes
    scrape_configs: []

This setup ignores the selector and I'm seeing the collector is scraping all the serviceMonitor resources.

I also tried adding matchlabels since I found an example in the one of the test files:

    allocation_strategy: per-node
    filter_strategy: relabel-config
    collector_selector:
      matchlabels:
        app.kubernetes.io/name: opentelemetry-agent
        app.kubernetes.io/instance: opentelemetry
    prometheus_cr:
      scrape_interval: 30s
      pod_monitor_selector:
          matchlabels:
            EnableOtelCollector: yes
      service_monitor_selector:
          matchlabels:
            EnableOtelCollector: yes
    scrape_configs: []

And then it ignores all serviceMonitors, even if they have the label to match.

I also tried different label matchers, and no luck. I'm trying to understand what might I see in the logs that could help but all I see is this:

{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Starting the Target Allocator"}
{"level":"info","ts":"2024-10-23T10:29:48Z","logger":"setup","msg":"Prometheus config empty, skipping initial discovery configuration"}
{"level":"info","ts":"2024-10-23T10:29:48Z","logger":"allocator","msg":"Starting server..."}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Waiting for caches to sync for namespace"}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Caches are synced for namespace"}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Waiting for caches to sync for podmonitors"}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Caches are synced for podmonitors"}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Waiting for caches to sync for servicemonitors"}
{"level":"info","ts":"2024-10-23T10:29:48Z","msg":"Caches are synced for servicemonitors"}
{"level":"info","ts":"2024-10-23T10:43:38Z","logger":"allocator","msg":"Could not assign targets for some jobs","allocator":"per-node","targets":4,"error":"could not find collector for node ip-10-122-54-130.eu-central-1.compute.internal\ncould not find collector for node ip-10-122-54-130.eu-central-1.compute.internal\ncould not find collector for node ip-10-122-54-130.eu-central-1.compute.internal\ncould not find collector for node ip-10-122-54-130.eu-central-1.compute.internal"}

Anything that would shed light on the issue or any explanation of how to correctly setup target allocator to have it only collect metrics from selected sources would be highly appreciated.

Thank you

swiatekm commented 2 days ago

The error in your log means that you don't have collector Pods on Nodes where Pods selected by your monitors are running. This may be related to the issue you're experiencing, but given that you're seeing no data whatsoever, this probably isn't the case. Are you sure the labels are correct? Can you also post the output of the /scrape_configs endpoint on the target allocator?