rabbitmq / osiris

Log based streaming subsystem for RabbitMQ
Other
45 stars 10 forks source link

Add new osiris_log:tail(Directory) function. #146

Open kjnilsson opened 11 months ago

kjnilsson commented 11 months ago

Currently the stream coordinator uses the osiris_log:overview/1 function to get the tail of each member during the writer selection process. This function does unnecessary index scanning work that the coordinator does not make use of. instead osiris_log:tail/1 would only get the last valid {Epoch, ChunkId} which should be much more efficient.

in addition it will return a "dirty" indicator to signal if it could detect that the server node the member was running on was shut down uncleanly such that the unwritten part of the page cache was lost. This indicator can be used by the stream coordinator to adjust the selectable set to avoid electing members that do not have all the

This indicator can never be 100% accurate but it should be able to catch the most common page cache loss scenarios by doing the following checks.

If the last index record has no trailing data itself and points to a valid chunk in the segment (CRC passes) and there is at most 1 trailing (but valid) chunk the osiris log is considered ok. During normal operations the chunk is written before the index entry and the rabbitmq process could crash in between these two events which is why a single valid trailing chunk in the segment is not indicative of page cache loss. Two or more chunks would however indicate that blocks pointing to the index file were never flushed correctly.

An empty index where there is no corresponding segment file is also considered ok as when a segment fills an we need to open a new one the index file is written first. If there is a segment file (empty or not) and the index does not even have it's index header we consider this indicative of page cache loss.

The most common scenario indicating page cache loss is most likely index record pointing to missing segment data but this check may well need to be refined and evolve over time and based on actual testing with different real file systems and storage types.

kjnilsson commented 11 months ago

It's important to note that this check only works once. After an osiris log has been initialised against the directory the files will have any dangling / trailing entries truncated and next time it will return ok even if the member may not have caught up with where it was before the page cache was lost.

Still it adds more steps to recreate failure scenarios, pushing the boat farther out.