Open QuentinBisson opened 2 years ago
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
Do we really want this one to be closed? This has a real usage impact IMO, and correct error logging would really help with using grafana-agent.
👋 Closing an issue as stale doesn't mean we don't want to do it or that we don't think it's important, just that it's not currently prioritized.
We're also currently exploring whether it makes sense for us to get rid of the stalebot and change how we manage the issue queue. In the meantime, I'll reopen this and tag it keepalive for now.
After a live discussion in our community call, it seems like the biggest issue here is that Kubernetes SD is hiding errors when it can't connect to Kubernetes. This is something we'll need to fix upstream so that the discovery errors get exposed as log lines.
I've run into this again and tracked down why nothing gets logged. It turns out that Kubernetes' client is hiding the error logs by default, and you have to explicitly install a hook to get them via SetWatchErrorHandler. This is still something that needs to be fixed upstream in Prometheus though.
I tested this and can confirm that it makes errors show up in logs. Agreed that this needs to be fixed upstream.
Hi there :wave:
On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.
To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)
When trying out the agent v0.26.1 on a cluster with an invalid rbac configured (i.e. invalid service account), the agent is not logging any error so it makes it impossible to debug what is failing as there are no errors returned in the logs:
In the mean time, with the same service monitors, prometheus operator in agent mode logs those errors: