samber / awesome-prometheus-alerts

🚨 Collection of Prometheus alerting rules
https://samber.github.io/awesome-prometheus-alerts/
Other
6.21k stars 969 forks source link

PrometheusTargetMissingWithWarmupTime check issue with multiple jobs on the same target #424

Open gskornowicz opened 1 week ago

gskornowicz commented 1 week ago

Hi, I'm wondering if I can do something to avoid this error:

Error executing query: found duplicate series for the match group {instance="server.company.com"} on the left hand-side of the operation: [{__name__="up", instance="server.company.com", job="bind", type="dns"}, {__name__="up", instance="server.company.com", job="node", os="Linux", type="vm"}];many-to-many matching not allowed: matching labels must be unique on one side

It's realated to PrometheusTargetMissingWithWarmupTime alert and it's expression sum by (instance, job) ((up == 0) * on (instance) group_right(job) (node_time_seconds - node_boot_time_seconds > 600)) which if I understand correctly can match multiple up==0 if I have more than one job at the same target? Any way to avoid/fix that?

samber commented 1 week ago

Do you have multiple Prometheus instances in federation mode or a remote-write setup ?

In that case, add a label to differentiate both jobs/prometheus.

gskornowicz commented 1 week ago

Hi @samber No multiple Prometheus instances nor the remote-write setup. It seems that the problem is due to multiple jobs per one instance?

samber commented 1 week ago

Yes, you may have multiple series with identical labels.

Do you use service discovery? Did you check if an exporter endpoint is declared twice in prometheus.yml ?