jtmoon79 / super-speedy-syslog-searcher

Speedily search and merge log messages by datetime
MIT License
38 stars 1 forks source link

first line of non-datetimestamped text is dropped #287

Open jtmoon79 opened 5 months ago

jtmoon79 commented 5 months ago

Describe the bug

Given a text log file wherein the first X lines have no matching datetimestamp, s4 will not print those lines.

To Reproduce

Using logs/RancherOS-1.5.8/docker.log

$ s4 ./logs/RancherOS-1.5.8/docker.log
time="2022-10-09T21:33:15.541012931Z" level=info msg="Starting up"
time="2022-10-09T21:33:15.547531433Z" level=info msg="libcontainerd: started new containerd process" pid=1178
...

however, the file logs/RancherOS-1.5.8/docker.log has contents

crypto/rand: blocked for 60 seconds waiting to read random data from the kernel
time="2022-10-09T21:33:15.541012931Z" level=info msg="Starting up"
time="2022-10-09T21:33:15.547531433Z" level=info msg="libcontainerd: started new containerd process" pid=1178
...

This may be confusing to users. In this case, no datetime filter was passed so the user expects the entire file to be printed. But the first line is dropped.

Environment:

Additional context

This was an intentional design choice early-on. As a simplifying principle, the syslog log message is considered a log message wherein the first line contains the datetimestamp. Any different presumption means processing some ad-hoc text log files with multi-line log messages could be too difficult or impossible to process. One additional reason to drop the first X lines without a datetimestamp, is it is unknown when those first lines of text were inserted, e.g. the message crypto/rand: blocked for 60 seconds waiting to read random data from the kernel may have been printed at nearly the exact same time as the proceeding log message time="2022-10-09T21:33:15.541012931Z" level=info msg="Starting up"... or, as suggested by those two specific messages themself, the first line may have been printed many seconds before the second line. Since the first lines' real datetime of occurence cannot be known, it is just dropped, as it's preferable not to mislead the user. So the first few lines of text was deliberately dropped.

However, maybe a special case should be made here. Upon deciding upon a datetimestamp regex for the file, any preceding lines should be included as part of that first log message. Though, changing that presumption, the first line of text always has the datetimestamp substring, would complicate several major components including SyslogProcessor and printers.rs.