jthornber / blk-archive

Dedup and compress your device mapper devices. Works especially well with thin provisioning.
8 stars 3 forks source link

Handle interrupted pack #12

Open tasleson opened 1 year ago

tasleson commented 1 year ago

Initial take at providing away to put an archive back into good state if a pack operation gets interrupted.

From git commit 280131e

The most important objective is to prevent the data slab and hashes slab from
getting corrupted and losing archived data.  Incomplete writes during a pack to
the slabs should be the only way for the slabs to get in an inconsistent state.
To allow us to detect and correct this we introduce a check point file at the
root of the archive which is written and sync'd to stable storage before we
start the pack operation.  This way if the pack operation is interrupted, we
can put the slab files back to where they were before we started with a repair
option.  Moving forward, the idea is we add the ability to periodically update
the checkpoint for long running operations by quiescing IO to the data slab,
hashes slab, offsets files, and the stream output and recording the offset into
the input data.  Then we can resume the operation by checking the files,
truncating where needed, and then resuming the de-dupe operation.

Note: If the slab file and the hashes file have no corruption and the
number of slabs match between the data and hash slab, the slab files are not
touched!  Thus the archive size could be much larger than would be
indicated by the listing of the archive as the data for the interrupted
pack operation is retained, but the stream is not.

I guess I could add a statement that the archive could get corrupted from bitrot, but that will be addressed in a future change where we introduce erasure coding support or similar.