RFE: avoid ausearch excessive disk IO when processing a log with a checkpoint

The ausearch command, when processing a log file from a checkpoint, opens the audit log and searches from the very beginning of the file, pulling each record and checking the timestamp until it reaches the checkpoint timestamp.

When an audit log gets exceedingly large, this can result in noticeable impact on the IO subsystem, particularly on older machines with slower disks and controllers. On newer systems, it can result in unnecessary SSD wear depending on OS buffer cache patterns.

This issue was noticed when "ausearch" was put into a Splunk pipeline, running a frequent checkpointed ausearch in order to have a near-realtime feed of audit events into the Splunk collector, and was run on a group of diskless nodes NFS-mounting their root and overlay filesystems from a server. A number of systems had accumulated very large audit logs due to an unrelated problem in log rotation, and ausearch had to read an entire 2.5GB audit log each time in order to pull out the ten lines that appeared since the last run a minute prior, crushing the NFS server in the process. (A workaround is in place)

A more efficient implementation of checkpoints is called for, I believe.

I have a couple of approaches in mind as to how to address this issue.

Add the file offset to the checkpoint file, along with the timestamp and the log file's device/inode. Have ausearch use a seek call to jump to that offset in the applicable log and then search for the timestamp from that point, perhaps after moving back a bit in the file in case anything odd happened to the file after the checkpoint was taken. The addition of the offset would need to be handled in such a way as to preserve backwards compatibility with older versions of ausearch reading the checkpoint file.
Do a seek() binary-search for the timestamp, based on the typical audit log record size and the size of the log file itself being over a threshold. Seek to the halfway point of the file and read a timestamp - if it's less than the checkpoint timestamp, seek to 3/4. If you pass the timestamp, seek backwards to 5/8, and so on, in order to get close to the timestamp before beginning to walk through each record looking for it while reading as little of the file as possible to get there.

linux-audit / audit-kernel

RFE: avoid ausearch excessive disk IO when processing a log with a checkpoint #117