Closed the-mikedavis closed 1 month ago
I took some rough measurements with tprof
from OTP 27. The gist is that time and memory savings look pretty good: 1.62s down to 0.28s and 178 million words of memory down to ~20 million for ra_snapshot:init/6
on a QQ's checkpoint directory (from the qq-v4
branch) with 5 million messages.
Other checkpoints we can validate during promotion and discard ones that fai
For a quorum queue were consumers keep up with ingress checkpoints are promoted very often. It would be nice not to to have to do the validation work every time just because we optimised recovery. My thought was that once we'd found a valid checkpoint during recovery we'd assume all prior checkpoints are also valid. That should be roughly as good as promoting any other checkpoint.
The most likely way a checkpoint would become corrupted is if the server hard stopped during a write or fsync. Sure there are other ways checkpoints could become corrupted but at least we guard against the most likely one.
@mkuratczyk noticed that with many QQs on the qq-v4 branch and each QQ having many checkpoints, we spend a fair amount of effort reading the checkpoints during recovery. This is because
ra_snapshot:find_checkpoints/1
uses thera_snapshot:validate/1
callback to ensure that each snapshot is valid.validate/1
is somewhat expensive inra_log_snapshot
since it fully reads and decodes the checkpoint, discarding the result.Not all of this validation is necessary: we can stop validating checkpoints when we find the latest checkpoint which is valid. This is likely to be good enough. I've also updated
find_checkpoints/1
to stop its search when it finds a checkpoint with a lower index than the current snapshot as any checkpoints lower than the snapshot index won't be used for promotion and should be removed. For many QQs with many checkpoints each this should save some I/O usage and memory.