Open rknightion opened 1 year ago
Can you help checking if custom metrics triggered by external_labels
-missing alerts also don't have the aforementioned labels? If not, could you add another prometheus.scrape
section to send scraped metrics to the prometheus.remote_write
one?
IMO, the "built-in" metrics are possibly scraped by prometheus.remote_write
block and hence are inherited the external_labels
. The labels you're seeing are probably from the triggering metrics, as claimed by Prometheus's Alerting Rule doc
@hainenber I've checked the underlying metrics that triggered both sets of alerts and they all seem have all of the external_labels (which is one of the things confusing me as if the underlying metrics have the labels I would have expected the alerts to have them also?)
@rknightion Can you provide one of the alert definitions for one of the alerts where you do not see the expected external_labels?
If the alert rule is aggregating away the external_labels, then they wouldn't appear in the alerts.
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Hi there :wave:
On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.
To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)
What's wrong?
When making use of mimir.rules.kubernetes to sync PrometheusRules, some rules are syncing with the expected externallabels applied during the remote write step whereas others are completely missing these externallabels.
As a result, our eventual groupings in Mimir Alertmanager miss some of these rules.
So far the main thing I've noticed between the two sets of rules is that "built in" rules (from kubernetes-mixin for example) seem to retain the labels whereas ones created by our own helm charts or via other upstream projects do not.
These labels do not appear in any of the PrometheusRule CRDs themselves so I assume they are being added by the remote-write capability and that if there is a bug it's that the externallabels aren't being applied to all synced rules.
Steps to reproduce
Set up mimir.rules.kubernetes to scrape all PrometheusRules along with remotewrite external labels
When it is run, some synced rules have the external labels and some do not. In the screenshot from the alert grouping page, the alerts at the top are ungroupped despite coming from the same clusters as the groupped alerts. The labels customer, envtype, mimircluster, product and cluster have been added to the synced alerts in the bottom half but not the ones at the top. This behaviour difference I think is an indication of a bug or divergence in behaviour that I can't see documented.
System information
EKS 1.24
Software version
v0.35.2 provided by the k8s-monitoring helm chart
Configuration
No response
Logs