tasket / wyng-backup

Fast backups for logical volumes & disk images
GNU General Public License v3.0
241 stars 16 forks source link

FUSE filesystem for mounting archives as volumes #16

Open tasket opened 5 years ago

tasket commented 5 years ago

Use case

Retrieving individual files from backed-up volumes.

Solution

A read-only FUSE type filesystem that can mount any volume-session in the archive as a complete volume should be attainable without a ton of effort. This is because the archive format is similar to Apple's sparsebundles.

Blocking

Related projects

https://github.com/jeffmahoney/sparsebundle-loopback

https://github.com/torarnv/sparsebundlefs

tlaurion commented 1 year ago

@tasket :How much the archive format is similar to Apple's sparsebundle?

tasket commented 1 year ago

I think they are 75-80% similar. The basic idea in sparsebundle is "bands" which maps directly to Wyng's chunks.

Of course, when you say sparsebundle, it doesn't account for all the other stuff Apple layered on it to make it handle sessions/snapshots, but that matters little bc Wyng's metadata can create a complete map to all chunks belonging to a session very easily. So, merge_manifests(), give resulting list to FUSE driver (which we know can be written in Python) and enjoy access.

This would be dead simple if particular compression & encryption features were not a factor. To be able to read the v0.4 format fully it would be simplest to have the driver include Wyng as a module... this means adhering to module schema and I've been moving in that direction by removing reliance on global vars.

tasket commented 1 year ago

The only difference between bands vs Wyng chunks (besides compression/encryption encoding) is the fact that Apple creates a complete picture of each session in each session dir hierarchy (they famously broke a Unix taboo and implemented dir hardlinks to make this more efficient); Wyng's session dirs only contain differences from previous session. So Wyng's merge_manifests() bridges that gap and gives us a complete volume for any given incremental session.... really its just a pre-sorted list of chunk file paths and hashes.