Open Ccccclong opened 11 months ago
This is likely due to a deadlock. Once the request starts timing out, can you share a goroutine dump by going to /debug/pprof/goroutine?debug=1 and /debug/pprof/goroutine?debug=2 on the agent's container's HTTP server? Having both dumps (one has more info but can be more tedious to read) will be helpful for us to track it down.
Sorry for the late response. This problem happens randomly and it hasn't appeared for the past weeks.
Additionally this log is from grafana agent helm chart version 0.3.11, I have recently upgraded it in hope that it will solve the issue.
Thanks for the help.
Are you ensuring the previous reload was complete before calling the next?
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
Hi there :wave:
On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.
To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)
What's wrong?
The
POST /-/reload
endpoint which is called byconfig-reloader
periodically would randomly starts to timeout, and all consequent calls to the endpoint would timeout indefinitely. Causing the grafana agent to not collect the logs from new pods.###Steps to reproduce
The issue tends to happen more quickly if you manually calls the
POST /-/reload
endpoint frequently, e.g. 100 calls/secondSystem information
Ubuntu 22.04 x86_64
Software version
Grafana Agent Operator Helm Chart 0.2.15, on RKE2 Cluster 1.25.3
Configuration
Logs