Open thampiotr opened 5 months ago
There is a workaround for now: set the instance
label to a common value for all instances in the cluster, using discovery.relabel
component. For example, this component sets it to "alloy-cluster":
discovery.relabel "replace_instance" {
targets = discovery.file.targets.targets
rule {
action = "replace"
source_labels = ["instance"]
target_label = "instance"
replacement = "alloy-cluster"
}
}
You'd add the above component between your exporters and the prometheus.scrape.
Longer term fix can be also achieved via https://github.com/grafana/alloy/issues/399. Regardless, we should have good documentation to ensure users don't fall into this pit.
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Affected our 30+ blackbox probes over ~7 alloy deployed via Helm, we were missing out on random targets, triggering DatasourceNoData in our alerting. The workaround fixed it.
TBH this proposed rule is not a workaround. Using it breaks multiple dashboard and alert since we can't distinguish nodes running node-exporter anymore.
What's wrong?
Most embedded Prometheus exporters set the
instance
label to the hostname where Alloy runs.This breaks in a subtle, but significant way, the fundamental clustering assumption that all instances have the same configuration. The exporters implicitly inject the hostname as an instance label, but instances usually have different hostnames. This leads to either no scraping of metrics at all, or duplicate scraping with different instance labels (unnecessary).
Steps to reproduce
The issue was discussed in this PR, but decided to move the conversation here for better tracking and to provide a place to refer to for workarounds.