PrometheusTargetMissingWithWarmupTime check issue with multiple jobs on the same target

gskornowicz commented 5 months ago

Hi, I'm wondering if I can do something to avoid this error:

Error executing query: found duplicate series for the match group {instance="server.company.com"} on the left hand-side of the operation: [{__name__="up", instance="server.company.com", job="bind", type="dns"}, {__name__="up", instance="server.company.com", job="node", os="Linux", type="vm"}];many-to-many matching not allowed: matching labels must be unique on one side

It's realated to PrometheusTargetMissingWithWarmupTime alert and it's expression sum by (instance, job) ((up == 0) * on (instance) group_right(job) (node_time_seconds - node_boot_time_seconds > 600)) which if I understand correctly can match multiple up==0 if I have more than one job at the same target? Any way to avoid/fix that?

samber commented 5 months ago

Do you have multiple Prometheus instances in federation mode or a remote-write setup ?

In that case, add a label to differentiate both jobs/prometheus.

gskornowicz commented 5 months ago

Hi @samber No multiple Prometheus instances nor the remote-write setup. It seems that the problem is due to multiple jobs per one instance?

samber commented 5 months ago

Yes, you may have multiple series with identical labels.

Do you use service discovery? Did you check if an exporter endpoint is declared twice in prometheus.yml ?

gskornowicz commented 5 months ago

I don't use service discovery.

I don't see any duplicated exporter endpoints

It's not clear to me what caused it, because labels are not identical: {__name__="up", instance="server.company.com", job="bind", type="dns"} <- it's blackbox-dns exporter {__name__="up", instance="server.company.com", job="node", os="Linux", type="vm"} <- it's node exporter The job and the type are different, it is because instance label is the same?

attaching prometheus.yml

global:
  scrape_interval: 20s
  scrape_timeout: 20s
  evaluation_interval: 15s

  external_labels:
    environment: prometheus.company.com

rule_files:
  - /etc/prometheus/rules/*.rules

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - prometheus.company.com:9093

scrape_configs:
  - job_name: prometheus
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com:9090
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: grafana
    static_configs:
    - targets:
      - prometheus.company.com:3000
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: alertmanager
    static_configs:
    - targets:
      - prometheus.company.com:9093
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: metrics-snmp
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: metrics-vmware
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9272
  - job_name: metrics-blackbox
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: node
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/node.yml
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: bind
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/bind.yml
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: snmp-idrac
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-idrac.yml
    params:
      module:
      - idrac
      auth:
      - idrac
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-synology
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-synology.yml
    params:
      module:
      - synology
      auth:
      - synology
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-wlan
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-wlan.yml
    params:
      module:
      - cisco
      auth:
      - cisco
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-firewall
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-firewall.yml
    params:
      module:
      - barracuda
      auth:
      - barracuda
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - source_labels:
      - instance
      regex: (.*).*1$
      replacement: primary
      target_label: boxrole
    - source_labels:
      - instance
      regex: (.*).*2$
      replacement: secondary
      target_label: boxrole
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-powerwalker
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-powerwalker.yml
    params:
      module:
      - powerwalker
      auth:
      - powerwalker
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-switch
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-switch.yml
    params:
      module:
      - switch
      auth:
      - switch
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-brocade
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-brocade.yml
    scrape_interval: 30s
    scrape_timeout: 30s
    params:
      module:
      - brocade
      auth:
      - brocade
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: blackbox-http
    metrics_path: /probe
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/blackbox-http.yml
    relabel_configs:
    - source_labels:
      - module
      target_label: __param_module
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: blackbox-icmp
    metrics_path: /probe
    params:
      module:
      - icmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/blackbox-icmp.yml
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __address__
      target_label: instance
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: blackbox-dns
    metrics_path: /probe
    params:
      module:
      - dns
    static_configs:
    - targets:
      - one.one.one.one
      - ns1.company.com
      - ns2.company.com
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115

samber / awesome-prometheus-alerts

PrometheusTargetMissingWithWarmupTime check issue with multiple jobs on the same target #424