Open makeittotop opened 6 months ago
From whatever I can tell with my limited knowledge of golang, and channels, it appears that there are 2 goroutines (in this case) - one for localhost:13100, other for logs.my-loki-instance.net in the grafana-agent process. Both of them are reading form the same channel (api.Entry) which is being populated in the promtail package in grafana/clients/pkg/promtail/targets/file/tailer.go readLines() function. As the localhost:13100 goroutine gets blocked due to failling into retries and exponential backoffs, it delays the other my-loki goroutine from receiving data too - atleast my tests confirm this. Is this due to the fact that the underlying api.Entry channel is "full" due to 1 of the 2 receivers being tied up elsewhere? My tests show that as soon as the failing goroutine unblocks after exhausting its retries, both receivers receive data pretty much immediately.
Hi there :wave:
On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.
To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
What's wrong?
I've noticed that in case of a multi loki clients setup to forward logs to, if one of the loki clients starts failing for some reason, eg. - no process listening on the specified port, etc, it starves other working loki endpoints to receive data as well until the failing client exhausts all of its
max_retries
(default = 10). Once the loop gets reset, the same issue repeats itself again. In the end, the working clients only get the data every 6 minutes or so based on what themax_period
is set to (Default = 5m). This also leads to "gaps" in the grafana dashboard while looking at the data for those clients,Steps to reproduce
Take a look at this nominal config -
./agent-local-config.yaml
Start the agent as
Now, let's assume that the localhost:13100 instance is missing for some reason. In such a case I expected the other endpoint (logs.my-loki-instance) to be able to receive data at the configured scrape intervals (60s), but that doesn't happen as explained above.
System information
Linux 6.5.0-15-generic
Software version
Grafana Agent 0.35.0 and master atm
Configuration