Add new osiris_log:tail(Directory) function.

Currently the stream coordinator uses the osiris_log:overview/1 function to get the tail of each member during the writer selection process. This function does unnecessary index scanning work that the coordinator does not make use of. instead osiris_log:tail/1 would only get the last valid {Epoch, ChunkId} which should be much more efficient.

in addition it will return a "dirty" indicator to signal if it could detect that the server node the member was running on was shut down uncleanly such that the unwritten part of the page cache was lost. This indicator can be used by the stream coordinator to adjust the selectable set to avoid electing members that do not have all the

This indicator can never be 100% accurate but it should be able to catch the most common page cache loss scenarios by doing the following checks.

If the last index record has no trailing data itself and points to a valid chunk in the segment (CRC passes) and there is at most 1 trailing (but valid) chunk the osiris log is considered ok. During normal operations the chunk is written before the index entry and the rabbitmq process could crash in between these two events which is why a single valid trailing chunk in the segment is not indicative of page cache loss. Two or more chunks would however indicate that blocks pointing to the index file were never flushed correctly.

An empty index where there is no corresponding segment file is also considered ok as when a segment fills an we need to open a new one the index file is written first. If there is a segment file (empty or not) and the index does not even have it's index header we consider this indicative of page cache loss.

The most common scenario indicating page cache loss is most likely index record pointing to missing segment data but this check may well need to be refined and evolve over time and based on actual testing with different real file systems and storage types.

rabbitmq / osiris

Add new osiris_log:tail(Directory) function. #146