jtmoon79 / super-speedy-syslog-searcher

Speedily search and merge log messages by datetime
MIT License
31 stars 1 forks source link

syslog files without a year and leap date Feb. 29 do not revise the guesstimated year #245

Open jtmoon79 opened 3 months ago

jtmoon79 commented 3 months ago

Summary

A syslog file with a log message format without a date does not revise the guesstimated year when it encounters a log message with datestamp on leap year day February 29.

To Reproduce

Use committed log file ./logs/other/tests/dtf9c-23-12x2.log.gz with a Last Modified Time of September 2018. The GZIP encoded datetime is September 2, 2022 9:45:35 PM GMT-07:00 (derived from modified time 1662180335). The GZIP encoded datetime is used to place the last log message Dec 12 12:00:00 23; datetimestamp Dec 12 12:00:00 is guesstimated to be year 2022.

During processing, log message Feb 29 02:00:00 1 is encountered. It is given incorrect year 2021.

$ ls -l ./logs/other/tests/dtf9c-23-12x2.log.gz
-rw-r--r-- 1 user user 207 Sep  2  2018 ./logs/other/tests/dtf9c-23-12x2.log.gz

$ gunzip --stdout -k ./logs/other/tests/dtf9c-23-12x2.log.gz
Jan 1 01:00:00 0
Feb 29 02:00:00 1
Mar 3 03:00:00 2
Apr 4 04:00:00 3
May 5 05:00:00 4
Jun 6 06:00:00 5
Jul 7 07:00:00 6
Aug 8 08:00:00 7
Sep 9 09:00:00 8
Oct 10 10:00:00 9
Nov 11 11:00:00 10
Dec 12 12:00:00 11
Jan 1 01:00:00 12
Feb 28 02:00:00 13
Mar 3 03:00:00 14
Apr 4 04:00:00 15
May 5 05:00:00 16
Jun 6 06:00:00 17
Jul 7 07:00:00 18
Aug 8 08:00:00 19
Sep 9 09:00:00 20
Oct 10 10:00:00 21
Nov 11 11:00:00 22
Dec 12 12:00:00 23

The result

$ ./target/release/s4 ./logs/other/tests/dtf9c-23-12x2.log.gz -u
20210101T080000.000+0000:Jan 1 01:00:00 0
20210101T080000.000+0000:Feb 29 02:00:00 1
20210303T100000.000+0000:Mar 3 03:00:00 2
20210404T110000.000+0000:Apr 4 04:00:00 3
20210505T120000.000+0000:May 5 05:00:00 4
20210606T130000.000+0000:Jun 6 06:00:00 5
20210707T140000.000+0000:Jul 7 07:00:00 6
20210808T150000.000+0000:Aug 8 08:00:00 7
20210909T160000.000+0000:Sep 9 09:00:00 8
20211010T170000.000+0000:Oct 10 10:00:00 9
20211111T180000.000+0000:Nov 11 11:00:00 10
20211212T190000.000+0000:Dec 12 12:00:00 11
20220101T080000.000+0000:Jan 1 01:00:00 12
20220228T090000.000+0000:Feb 28 02:00:00 13
20220303T100000.000+0000:Mar 3 03:00:00 14
20220404T110000.000+0000:Apr 4 04:00:00 15
20220505T120000.000+0000:May 5 05:00:00 16
20220606T130000.000+0000:Jun 6 06:00:00 17
20220707T140000.000+0000:Jul 7 07:00:00 18
20220808T150000.000+0000:Aug 8 08:00:00 19
20220909T160000.000+0000:Sep 9 09:00:00 20
20221010T170000.000+0000:Oct 10 10:00:00 21
20221111T180000.000+0000:Nov 11 11:00:00 22
20221212T190000.000+0000:Dec 12 12:00:00 23

Expected

Since the GZIP Modified Time is Sept. 2022, then the last message Dec 12 12:00:00 23 should be given year 2022 (which it currently is). However, upon encountering Feb 29 02:00:00 1 and it is given date 20210101T080000.000+0000 (Jan 1 2021). Instead, the processing should notice that attempts to create a datetime for that log message failed. It should attempt a leap day valid year, e.g. 2000, and if that succeeds then it can confirm the date is Feb. 29. When datestamp Feb. 29 is confirmed, the processing should revise it's year guesstimate, and then update previously processed log messages.

In this case, during the backwards search for log messages in process_missing_year, it should allow for matching date Feb 29 no matter the guesstimated year. The revised year guesstimate should result in last message Dec 12 12:00:00 23 given year 2017, and message Feb 29 02:00:00 1 given valid datetime year 2016, i.e. 20160229T080000.000+0000.

$ ./target/release/s4 ./logs/other/tests/dtf9c-23-12x2.log.gz -u
20160101T080000.000+0000:Jan 1 01:00:00 0
20160101T080000.000+0000:Feb 29 02:00:00 1
20160303T100000.000+0000:Mar 3 03:00:00 2
20160404T110000.000+0000:Apr 4 04:00:00 3
20160505T120000.000+0000:May 5 05:00:00 4
20160606T130000.000+0000:Jun 6 06:00:00 5
20160707T140000.000+0000:Jul 7 07:00:00 6
20160808T150000.000+0000:Aug 8 08:00:00 7
20160909T160000.000+0000:Sep 9 09:00:00 8
20161010T170000.000+0000:Oct 10 10:00:00 9
20161111T180000.000+0000:Nov 11 11:00:00 10
20161212T190000.000+0000:Dec 12 12:00:00 11
20170101T080000.000+0000:Jan 1 01:00:00 12
20170228T090000.000+0000:Feb 28 02:00:00 13
20170303T100000.000+0000:Mar 3 03:00:00 14
20170404T110000.000+0000:Apr 4 04:00:00 15
20170505T120000.000+0000:May 5 05:00:00 16
20170606T130000.000+0000:Jun 6 06:00:00 17
20170707T140000.000+0000:Jul 7 07:00:00 18
20170808T150000.000+0000:Aug 8 08:00:00 19
20170909T160000.000+0000:Sep 9 09:00:00 20
20171010T170000.000+0000:Oct 10 10:00:00 21
20171111T180000.000+0000:Nov 11 11:00:00 22
20171212T190000.000+0000:Dec 12 12:00:00 23

Additional context

Found while investigating #189.

jtmoon79 commented 3 months ago

I need to confirm that date Feb. 29 is possible under some circumstances, i.e. not ignored for all circumstances.