Open yangvipguang opened 2 years ago
Hi @yangvipguang. I think this occurring when your file is rotated.
This might be a known issue. I'll take a look.
@dannykopping yes, the node log files are rotated 。 app: node log: log4js-node
@yangvipguang are you actually missing logs when this occurs, or are you just asking about the error in the logs?
This looks like good behaviour because when a file it rotated it will be renamed (but still maintain the same inode
) and then eventually deleted when the contents are compressed. I would expect these log lines to show up in those situations.
@dannykopping yes actually missing logs when the log is rolated
It appears Promtail really only tracks the positions based on the filename https://github.com/grafana/loki/blob/fb9a1989d856c78c5aaa43c56ece6f215669e066/clients/pkg/promtail/positions/positions.go#L111 and only recognizes the rotation of a file if it's actively running and watching during the rotation:
This is how this is logged:
level=info ts=2022-07-21T21:06:59.436123136Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/tmp/logs/log1.log
ts=2022-07-21T21:06:59.436160443Z caller=log.go:168 level=info msg="Seeked /tmp/logs/log1.log - &{Offset:12 Whence:0}"
level=info ts=2022-07-21T21:07:29.263830553Z caller=filetargetmanager.go:179 msg="received file watcher event" name=/tmp/logs/log1.log.1 op=CREATE
ts=2022-07-21T21:07:29.293354294Z caller=log.go:168 level=info msg="Re-opening moved/deleted file /tmp/logs/log1.log ..."
ts=2022-07-21T21:07:29.293476537Z caller=log.go:168 level=info msg="Waiting for /tmp/logs/log1.log to appear..."
level=info ts=2022-07-21T21:07:29.436953384Z caller=tailer.go:202 component=tailer msg="skipping update of position for a file which does not currently exist" path=/tmp/logs/log1.log
level=info ts=2022-07-21T21:07:29.437157557Z caller=filetarget.go:291 msg="removing directory from watcher" directory=/tmp/logs
level=info ts=2022-07-21T21:07:29.437240091Z caller=tailer.go:116 component=tailer msg="position timer: exited" path=/tmp/logs/log1.log
level=info ts=2022-07-21T21:07:29.437294963Z caller=tailer.go:202 component=tailer msg="skipping update of position for a file which does not currently exist" path=/tmp/logs/log1.log
level=info ts=2022-07-21T21:07:29.43736309Z caller=tailer.go:159 component=tailer msg="tail routine: tail channel closed, stopping tailer" path=/tmp/logs/log1.log reason=null
level=info ts=2022-07-21T21:07:29.437418767Z caller=tailer.go:152 component=tailer msg="tail routine: exited" path=/tmp/logs/log1.log
level=info ts=2022-07-21T21:07:29.437451053Z caller=tailer.go:240 component=tailer msg="stopped tailing file" path=/tmp/logs/log1.log
level=info ts=2022-07-21T21:08:29.436230999Z caller=filetarget.go:278 msg="watching new directory" directory=/tmp/logs
level=info ts=2022-07-21T21:08:29.436281994Z caller=tailer.go:143 component=tailer msg="tail routine: started" path=/tmp/logs/log1.log
ts=2022-07-21T21:08:29.436300892Z caller=log.go:168 level=info msg="Seeked /tmp/logs/log1.log - &{Offset:0 Whence:0}"
Certainly Promtail tries to read the remaining data from the just "deleted" (actually moved) file, but what happens when that fails or does not finish until Promtail is stopped?
There is documenation on how Promtail behaves in case the file was truncated (or replaced by a new file of the same name during logroation): https://github.com/grafana/loki/blob/main/docs/sources/clients/promtail/troubleshooting.md#a-tailed-file-is-truncated-while-promtail-is-not-running=
There it's clearly stated that if the new file was already bigger than the last position saved for this file (by name) there is data loss (as in "not shipped" to Loki) as there is no way to reconcile. There really seems no way around introducing the inode to positions to "abstract" away from the name of the file. The name simply changes for most cases / implementations of log rotation and the positions and filewatcher logic could be much simpler if it was just about finding candidates by pattern (read: name + potential wildcards) and then doing all of the position tracking using the inode as identifier. Adding some optimistic sorting on the creation date of the inode would even allow for backfilling of logs in (likely) the proper order.
Last but not least, using the inodes would also remove the current issue of the wildcard file pattern causing double (or multiple) ingests of the "same" file / data as described in: https://github.com/grafana/loki/pull/3655, but make this a feature:
If the inode was used to identify each file a rotated away file could also be resumed to be read from, just like Elastic documents for their Filebeat shipper: https://www.elastic.co/guide/en/beats/filebeat/current/file-log-rotation.html#file-log-rotation
@dannykopping would you potentially be open to discuss adding inode as a key to Promtail file tailing to avoid the known and documented issues with logroation?
the following may be relevant: https://github.com/grafana/loki/issues/8039#issuecomment-1424204802
Describe the bug A clear and concise description of what the bug is. Promtail often fails to read logs. log error: Restart promtail and return to normal
Version: 2.5.0 Platform: Centos7