Closed dhorbach closed 5 months ago
Thank you for the report @dhorbach, we'll look into this.
When we forked Promtail code into the Agent's components, we chose a slightly different way to schedule targets as tasks, and we might have missed something here.
We've done some preliminary investigation into this, and found that the issue seems to be that there is no mechanism for a container to be re-tailed after the log stream closes.
This is composed of two smaller issues:
The reason this happens with Docker and not with Kubernetes is mainly because the target is the same after a container restarts with Docker, but this isn't true with Kubernetes, where the ID changes.
We haven't identified the right way to fix the bug yet, but we do have a fair amount of confidence that modifying those two pieces above will lead us to the fix.
Just stumbled into this issue as well. I'm planning to replace node exporter, cAdvisor and promtail with Grafana Agent on hosts with docker discovery and this is a show stopper for now (as we want to capture logs after docker container crashes or normal restarts).
I believe this issue is what I'm encountering when the Docker hosts are replaced (as part of an autoscaling ECS cluster, for example). The Grafana Agent (flow mode) starts first, then Docker, and, until the agent is restarted, no logs flow. Strangely, metrics work fine, and there's nothing in the agent's logs to indicate it isn't working properly.
Hi there :wave:
On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.
To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)
What's wrong?
After restarting one of containers from which logs are scrapped - new logs will not be processed. Grafana agent is installed via system package on Amazon Linux. Reload of service configuration doesn't help.
https://github.com/grafana/loki/issues/5259 might be relevant
Steps to reproduce
System information
Linux ip-10-10-101-40.ec2.internal 6.1.66-93.164.amzn2023.x86_64 grafana/agent#1 SMP PREEMPT_DYNAMIC Tue Jan 2 23:50:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Software version
agent, version v0.39.0 (branch: HEAD, revision: 402672cb)
Configuration
Logs
No response