Open andrew-barnett opened 2 years ago
..."unwatching => watching => added" cycle repeats 6 more times, about every 3 to 5 minutes...
Are your logs rotating this frequently?
Per Mezmo support, they noticed an issue handling symlink files in 3.6 and 3.7 and requested we downgrade to 3.5. That worked for us. They just recently released 3.8 which was to have fixed this issue. We're testing 3.8 in our environment to check.
We are running logdna-agent v3.7.0 on a k8s cluster with 3 nodes. As the agent process runs it gradually loses track of log files and eventually stops exporting.
For example, we have a specific service launched on our cluster that is running on a specific node as a specific container. The logdna-agent logs, as of almost 24 hours later (so 2022-11-03T14:00:00Z), that mention this container are as follows:
So -- logdna-agent stopped watching this file as of 2022-11-02T16:04:17Z and then never picked it back up. I've logged in to the node and confirmed that this log file still exists and has lines in it from after this time -- almost 24 hours worth of lines.
The instability appears to have started with the log rotation process that happened at 2022-11-02T14:55:29Z, though it does seem the agent recovered just fine.
The k8s node is running
Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-1041-aws x86_64)
. Disk, memory and cpu are all well below max.