Open shootkin opened 4 years ago
This is a limitation from the fsnotify we are using. https://github.com/fsnotify/fsnotify
Seems like we need an option to allow for polling instead of using this library. That sounds like a lot of work.
Hey @shootkin,
I've done some investigations and this should not affect the tailing of your 4 files, it should still work even though this errors show up.
Basically we use fsnotify to pick up quickly new files, this is to solve the use case where files are showing up and disappearing quickly. But we always actually sync all files manually every 10 seconds by default. (you can change that if you want https://github.com/grafana/loki/blob/master/docs/clients/promtail/configuration.md#target_config)
Regarding tailing files, we always poll every 100ms.
I'm going to work on a PR to make the fsnotify optional so that you can deactivate it since in your use case you don't need this.
Also you should definitively increase your batch size with that volume. (https://github.com/grafana/loki/blob/master/docs/clients/promtail/configuration.md#client_config)
Is this log noisy ? I was thinking may be we should also throttle that error.
Let me know if that works for you.
Hi @cyriltovena, Thank you for investigation! So it means that the slowness is not due to this error. I also think it is better to remove this error from logs, cause it can mislead.
I increased batch size to 20 megabytes, but the slowness still persists. It matters for me, because I use logrotate, and if promtail is slow, then it will miss some log files.
Promtail tries to read log file to its end despite of the fact that this file was rotated (and it is good behaviour). But while promtail is processing old log file to its end, newly created log file is rotated too. Then promtail gets only the third file, and missed the second.
For now, promtail works for me only if I create separate instance of promtail for each log file. And the rate limit which I see is about 30 000 lines per second. Above this limit, promtail can miss some log files because of logrotate.
How does promtail misses the second file? Does it gets deleted?
Here is what actually happens:
So, I have 1, 2, and 4 files on loki, but 3rd file is missed.
May I bring up the suggestion to add inode numbers to the tracking of files to read and ship to make rotations of files a lot less error prone? See my comment at https://github.com/grafana/loki/issues/6503#issuecomment-1191957798
Same problem here, did you have a workaround ? Because containerD does not support logrotation
While we haven't made a formal decision yet, we expect in the near future that all new feature work will be done in the Agent's log collection pipelines rather than in Promtail. That being said, potentially this could be considered a bug fix and could still go into promtail.
Is https://github.com/grafana/loki/issues/6503 still a relevant fix?
Describe the bug If I create multiple .log files, say 4, and append log lines to them with high rate per second, say 50 000 log lines per second to each file, I see such errors in promtail:
Also I see slowness in appearing new log lines from these files in loki.
To Reproduce Steps to reproduce the behavior:
Expected behavior I don't expect any error, and I expect fast appearing logs' lines in loki.
Environment:
Screenshots, Promtail config, or terminal output I've tried to increase fs.inotify.max_queued_events kernel parameter to 1048576. After that, I see the same error, but a bit later than earlier (after 2-3 minutes of tailing log files)