jtmoon79 / super-speedy-syslog-searcher

Speedily search and merge log messages by datetime
MIT License
38 stars 1 forks source link

large compressed files with datetime pattern without year, fail to fully process #305

Closed jtmoon79 closed 2 months ago

jtmoon79 commented 3 months ago

Describe the bug

A gzipped file with datetime pattern without a year, e.g. Mar 21 23:33:53 fink NetworkManager[755]: <info>, prints the first few lines then quits.

To Reproduce

Given file like kern.log.2.gz found on a Ubuntu 16 system

Dec 15 11:25:15 fink kernel: [    0.000000] Linux version 4.15.0-112-generic (buildd@lcy01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 (Ubuntu 4.15.0-112.113~16.04.1-generic 4.15.18)
Dec 15 11:25:15 fink kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-112-generic root=UUID=c8be1632-2d46-419a-a239-d2499461000b ro quiet splash
Dec 15 11:25:15 fink kernel: [    0.000000] KERNEL supported cpus:
Dec 15 11:25:15 fink kernel: [    0.000000]   Intel GenuineIntel
Dec 15 11:25:15 fink kernel: [    0.000000]   AMD AuthenticAMD
Dec 15 11:25:15 fink kernel: [    0.000000]   Centaur CentaurHauls
Dec 15 11:25:15 fink kernel: [    0.000000] [Firmware Bug]: TSC doesn't count with P0 frequency!
Dec 15 11:25:15 fink kernel: [    0.000000] x86/fpu: x87 FPU will use FXSAVE
Dec 15 11:25:15 fink kernel: [    0.000000] e820: BIOS-provided physical RAM map:

This file is 306282 bytes decompressed according to gzip -vdck /var/log/kern.log.2.gz | wc -c

s4 only prints the first three lines and then stops processing.

Debug build has these interesting messages:


                ←find_sysline_between_datetime_filters: (430)
                →find_sysline_between_datetime_filters: (497)
                    →find_sysline_between_datetime_filters: (497, None, None)
                        →find_sysline_at_datetime_filter_linear_search: (SyslineReader@0x7f51473f2bb8, 497, None)
                         find_sysline_at_datetime_filter_linear_search: self.find_sysline(497)
                                →find_sysline_year: (497, None)
                                    →check_store: (497)
                                     check_store: fileoffset 497 not found in LRU cache
                                     check_store: fileoffset 497 not found in self.syslines_by_range
                                     check_store: fileoffset 497 not found in self.syslines
                                    ←check_store: return None
                                 find_sysline_year: (497): searching for first sysline datetime A …
                                 find_sysline_year: (497): self.linereader.find_line(497)
                                    →find_line: (497)
                                    ↔check_store_LRU: fileoffset 497 not found in LRU cache
                                     fileoffset 497 not found in self.lines_by_range
                                    ←check_store: fileoffset 497 not found in self.lines
                                     find_line: searching for first newline B (line terminator) …
                                        →read_block: (0): blockreader.read_block(0) (fileoffset 0 (0x00000000)), blocksz 65535 (0x0000FFFF), filesz 306282 (0x0004AC6A)
                                         read_block: blockoffset 0 not found LRU cache
                                         read_block: blockoffset 0 not found in blocks_read
                                            →read_block_FileGz: (0)
                                            ←read_block_FileGz: (0): return Done
                                    ←find_line: B1: read_block(0) returned Done "./logs/Ubuntu16/kern.log.2.gz"
                                 find_sysline_year: (497): LRU cache put(497, Done)
                                ←find_sysline_year: (497): return ResultS3SyslineFind::Done; A from LineReader.find_line(497)
                         find_sysline_at_datetime_filter_linear_search: self.find_sysline(497) returned Done
                        ←find_sysline_at_datetime_filter_linear_search: return ResultS3SyslineFind::Done
                     find_sysline_between_datetime_filters: returned Done
                    ←find_sysline_between_datetime_filters: return ResultS3SyslineFind::Done
                ←find_sysline_between_datetime_filters: (497)

Environment:

Other

Looking at debug output, this appears to be a result of #283

jtmoon79 commented 3 months ago

My guess is the "zero block stage" which has special processing for ad-hoc logs without a year, also proactively dropped old blocks. Then during the later "streaming stage", the request for block 0 failed. That block was already dropped leading to read_block_FileGz returning Done.

jtmoon79 commented 3 months ago

This should have been caught by compare-current-and-expected/compare.sh.

jtmoon79 commented 3 months ago

This may not be related to being a compressed file. Just noticed an error with datetime matching

$ ./target/release/s4 -u -s ./logs/other/tests/gen-20-2-2-faces.log
19700101T070001.000+0000:01 😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟😠😡😢😣😤😥😦😧😨😩😪😫😬😭😮😯😰😱😲😳😴😵😶😷😸😹😺😻😼😽😾😿🙀🙁🙂🙃
...

The 01 is interpreted as 1970-01-01T07:00:01.