Open kavirajk opened 1 year ago
I think this log pending issue with promtail + logrotation is also related to this issue I explained here.
Hey @kavirajk . I managed to reproduce the issue also. However, all ways that we used in test have the gaps...
❯ ls -la | grep generated.log
-rw-r--r-- 1 vdiachenko admin 78999088 Dec 22 10:18 fromcmdtail-generated.log
-rw-r--r-- 1 vdiachenko admin 78999121 Dec 22 10:18 tailer-inotify-generated.log
-rw-r--r-- 1 vdiachenko admin 77134964 Dec 22 10:18 tailer-polling-generated.log
-rw-r--r-- 1 vdiachenko admin 78999636 Dec 22 10:18 untouched-generated.log
As you can see, untouched-generated.log
has a size 78999636
, but the rest of the files have different sizes, even the file from tail -f
(fromcmdtail-generated.log
). inotify tail is most close to the original size but not exact...
With any solution, we might get the gaps if we just truncate the original log file... I think if we tune the polling tailer it could also work for us...
However, the root cause of the issue is the way how the log files are rotated... I think we need to support only the mode when we move the current log file (rename) and create a new (empty) file for the logs... In this case, the tailer can still continue reading the data from renamed file(because it has opened connection to the renamed file) and after this tailer will reopen the newly created file... https://github.com/grafana/tail/blob/master/tail.go#L211 But even in this case it might be some gaps not on tailer level but on log rotation level, when the old file is moved but a new one is not created ... this case described more here https://www.datadoghq.com/blog/log-file-control-with-logrotate/#create-or-copy-log-files-to-manage-rotation
Thanks @vlad-diachenko :)
Agree that there is no solution to fix the problem 100%, given the logs are rotated frequently just by droping the old log files without copying it. This particular proposal was just to show how inotify
performed better than polling
w.r.t to single file log rotation (even in your experiment, the log loss is greater with polling than with inotify if you notice).
But then as a bigger picture I started reading about the unstable nature of inotify (looks like there is hard limit on how much events it can be notified about, which can easily break at scale) plus resource conception can be huge. So integrating it with tool like promtail with goal to scrape lot of log files, can lead to worse behaviour.
NOTE: Totally agree to have some recommendation on "how to logrotate" effectively. However the thing that lead to have this experimentation in the first place is, with exact same log rotation setup (replacing old log file without copy), FileBeat able to fetch the logs where as Promtail couldn't.
I see it as two different action items here.
For (2) I want to do
Thanks @vlad-diachenko :)
Agree that there is no solution to fix the problem 100%, given the logs are rotated frequently just by droping the old log files without copying it. This particular proposal was just to show how
inotify
performed better thanpolling
w.r.t to single file log rotation (even in your experiment, the log loss is greater with polling than with inotify if you notice).But then as a bigger picture I started reading about the unstable nature of inotify (looks like there is hard limit on how much events it can be notified about, which can easily break at scale) plus resource conception can be huge. So integrating it with tool like promtail with goal to scrape lot of log files, can lead to worse behaviour.
NOTE: Totally agree to have some recommendation on "how to logrotate" effectively. However the thing that lead to have this experimentation in the first place is, with exact same log rotation setup (replacing old log file without copy), FileBeat able to fetch the logs where as Promtail couldn't.
I see it as two different action items here.
- Document how to log rotate effectively (just immediate truncate without copy is bad)
- Do our best with promtail polling to avoid skipping logs as much as possible, even if logs are rotated by just truncating it frequently
For (2) I want to do
- Have the batch size of promtail configurable, and have higher throughput and also lesser impact on polling to keep up.
- Sping up tailer per file (similar to what Filebeat does
Sounds great ;) let me know If I can help with anything ) I find it interesting so we could make it together if you wish ;)
tldr;
Promtail uses
tail
package as dependency to read file and watch for changes. Thetail
package supports two ways of watching the file changes (say when newline appended, file removed, file rotated etc)And Promtail uses polling all the time (even hard coded).
The problem is, with polling tailer performs poorly. Particularly when a log file is rotated very frequently. This leads to skipping some log lines during such cases.. The better approach is to use
inotify
based watcher.NOTE: Linux
tail
command usesinotify
based watcherExperiment to justify the problem
NOTE: Experiment is performed on Linux machine running MANJARO with kernel 5.15.81
Gist is we generate logs that is being logrotated frequently (when size exceeds 1M) and we create our custom tailer with two different modes (polling and inotify) to understand the difference. Finally we compare total volume of logs with 4 things
polling
modeinotify
modetail
.We use (1) and (4) to justify if any logs are missed by our
tailer
in different modes.Tailer in polling mode
We use following snippet with exact config promtail currently uses
Tailer in inotify mode
Same as above with
Poll: False
.Logrotate config
We use following logrotate config to rotate the generated log (
/home/kavirajk/generated.log
) when size exceeds 1MNOTE: Here when file reaches1M, original file is just truncated without moving old contents to any new files (say generated.log.1)
Steps
logrotate
frequentlytailer-polling-generated.log
./tailer-inotify | tee /home/kavirajk/tailer-inotify-generated.log
tail -f /home/kavirajk/generated.log | tee fromcmdtail-generated.log
watch -n 1 logrotate generated.conf -s .state.logrotate.status -v
time flog -f json -d 1ms -w -l | tee /home/kavirajk/untouched-generated.log >> /home/kavirajk/generated.log
~$ ls -l *-generated.log .rw-r--r-- 116M kavirajk 19 dec 08:55 fromcmdtail-generated.log .rw-r--r-- 116M kavirajk 19 dec 08:55 tailer-inotify-generated.log .rw-r--r-- 113M kavirajk 19 dec 08:55 tailer-polling-generated.log .rw-r--r-- 116M kavirajk 19 dec 08:55 untouched-generated.log