osism / kolla-operations

Repository for Grafana/Kibana dashboards and Prometheus alerting rules
https://www.osism.tech
8 stars 11 forks source link

Evaluating rule failed Alert: CephNodeDiskspaceWarning #34

Open Nils98Ar opened 1 year ago

Nils98Ar commented 1 year ago

I have extracted this from the docker logs prometheus_server output. Any idea what the problem is and how it can be fixed?

found duplicate series for the match group {instance=\"<internal_address of monitoring node>:9100\"} on the right hand-side of the operation: [
    {__name__=\"node_uname_info\", domainname=\"(none)\", instance=\"<internal_address of monitoring node>:9100\", job=\"node\",              machine=\"x86_64\", nodename=\"<hostname of monitoring node>\", release=\"<kernel release monitoring node>\", sysname=\"Linux\", version=\"<kernel version monitoring node>\"},
    {__name__=\"node_uname_info\", domainname=\"(none)\", instance=\"<internal_address of monitoring node>:9100\", job=\"ceph_nodeexporter\", machine=\"x86_64\", nodename=\"<hostname of monitoring node>\", release=\"<kernel release monitoring node>\", sysname=\"Linux\", version=\"<kernel version monitoring node>\"}
];
many-to-many matching not allowed: matching labels must be unique on one side

I have tried the ceph.rules from kolla-operations as well as this one from the upstream project: https://github.com/ceph/ceph/blob/7ae97667c9b7e4d86bb8976c2a96700aa3d4b1ce/monitoring/ceph-mixin/prometheus_alerts.yml

I'm not deep into prometheus...

Nils98Ar commented 11 months ago

@berendt @michaelbayr

Any hint where I could continue? Am I right that you don't have this duplicate series for the match group problem?

This seems to be the last "false positive" alert in our environment.