Open richfelker opened 2 years ago
I've written and tested a proof of concept for regenerating the localindex as part of the restore operation, and it worked for restoring an continuing incremental backups from a test repository. I think this is an acceptable solution, so I'll try to polish it up and commit it. Current limitations that need to be overcome:
In order to be able to continue using an incremental backup after restoring from it, you need the localindex corresponding to it. This can be achieved by making sure it's included in the backup, but that has 2 problems:
The second problem is solvable by keeping backups of indices in a separate backup store (note: they should still be encrypted, so this would mean another bakelite backup store, not just rsync or something), but the first remains.
I think the most elegant solution would be not to backup the index at all (exclude it, either manually in
exclude
file, or automatically by matching inode) and instead add functionality in therestore
operation to regenerate the index. A block-only index can be created simply by decrypting the blocks and mapping the sha3 of their decrypted content to the encrypted blob sha3. The inode part of the index can only be recreated when the files are actually restored into a real filesystem and assigned inode numbers. This may be problematic if the restore is taking place onto a transport medium that's different from the final filesystem the restored data will live on.Many users may be happy with just the block index being restored, as that covers the bulk of data in a backup with mostly files larger than 4k in size; without the inode index, new inode records would just be created for everything on the next incremental backup, but all the block data would be reusable. However we could also dump an intermediate file for regenerating the index, mapping pathnames to inode records in the backup, which could be programmatically converted to an inode-based index once the files are in their final place.