jtmoon79 / super-speedy-syslog-searcher

Speedily search and merge log messages by datetime
MIT License
45 stars 2 forks source link

fully extract `.journal`, `.evtx` compressed/archived files to temporary files #284

Closed jtmoon79 closed 5 months ago

jtmoon79 commented 6 months ago

Summary

.journal files that are compressed or archived (e.g. user-1000.journal.xz) cannot be read by s4. Allow decompressing the .journal file to a temporary .journal file and that temporary file path is passed to the underlying libsystemd.so API sd_journal_open_files call.

Current behavior

Currently, any .journal files within a compressed or archived file format are treat as ad-hoc text files. An attempt is made to match on syslog-like lines of text. That attempt fails and processing the file is abandoned (fixing these presumptions that compressed files are always text is Issue #285).

The same is true for .evtx files.

libsystem.so cannot be given a compressed .journal file, only uncompressed .journal files. See man page sd_journal_open.

Suggested behavior

Compressed or archived .journal files should be decompressed to a temporary named file (use tempfile.NamedTempFile). That temporary named file path is then passed to the systemd journal API sd_journal_open_files.

The same should be done for the EvtxReader struct.

Some additional changes are needed so this is hidden from the end-user; i.e. CLI option --prepend-filename should not print the path of the named temporary file. Also, the --summary should have a new line(s) describing this named temporary file.

This introduces the difficulty of handling "Out of disk space" errors during writing to the temporary file. Tests should be created that simulate this situation and verify it is handled appropriately.

Also, if s4 is forced to quit early, is it possible to cleanup those temporary files? This would probably require introducing custom signal handling.

A nice small addition would be verifying it's a valid journal file before decompressing (search for some file type signature) to avoid a costly operation (extracting to a temporary file) that will then fail during reads ("fingerprint" matching mentioned in Issue #257 and Issue #270).