v6ak / qubes-incremental-backup-poc

proof of concept of incremental backup scheme for Qubes
https://groups.google.com/d/msgid/qubes-users/901b82dc-f781-4c13-ad00-33b4337fc84a%40googlegroups.com
9 stars 2 forks source link

Consider backup backends #35

Open v6ak opened 7 years ago

v6ak commented 7 years ago

Currently, we use Duplicity. The reason is not that it was carefuly chosen as the best one. The reason is I have some experience with it, despite I chose it in past for quite a different scenario. So, I am collecting info about backup backends in order to decide well: https://docs.google.com/spreadsheets/d/1rUXn8VkR5nrrtDhywKBpNu2zuTHzOHDX6F053ynBSjw/edit?usp=sharing

Legend for features:

Legend for first column:

What do we want:

We will want at least one file-based backend and one block-based backend (qvm-backup or similar).

If you can fill in some missing info or suggest another great backend, write it here, please!

tasket commented 7 years ago

To me, the selection of a backup tool running in dom0 comes down to three criteria:

  1. Does it need interactive access to the destination media/filesystem? If so, it cannot be used.

  2. Does it scan all data to find deltas? This is what most tools do, and its not terrible in usual practice because they can skip many small files according to modification date. But in dom0, nearly all our data is a handful of huge image files, so mod date becomes too coarse an indicator to be helpful in a majority of use cases. This is not a deal-breaker, but suffering it means the only efficiency gain we can anticipate is in storage space.

  3. Does the storage format allow old backups to be pruned arbitrarily by date, without compromising the integrity of the backup set? If not, the storage efficiency over non-incremental backups like qvm-backup will be marginal.

Factors being traded-off:

Factors that cannot be traded:

Duplicity doesn't excel at any of the trade-off factors except space on source. On laptops using SSDs it will incur high CPU and disk activity (which I think is a poor tradeoff for this type of equipment), and repeated full backups will mean elevated network intensity and delays.

A person keeping hundreds of GB of static reference data should not suffer a large penalty when they modify a small detail in a large disk image. But I came to a conclusion that an optimal tool cannot be selected for this use case, because it doesn't exist. What's missing is a runtime (dom0) environment that efficiently flags deltas, like an archive-bit at the block level.

Snapshot-capable storage systems (Btrfs, ZFS, thin-provisioned LVM) do track deltas in a fine-grained way (even for large image files and volumes), but they force the user to hang onto a significant amount of old data on the source PC. Their backup tools (i.e. btrfs-send) also assume some interactive control of the destination media, so isolation is an issue.

Apple's Time Machine may represent the best trade-off for all factors, since it was designed for high-frequency backups over the Internet as well as ability to manage sets without decrypting them. Time Machine uses sparsebundle bands to chunk data, which has a number of benefits. Although TM assumes interactive control of the destination, backup sets consisting of sparsebundles are flexible enough to be managed without full interactive access (e.g. a backupVM could easily perform necessary hardlinks or deletions without any back-and-forth between it and dom0, and increased trust isn't required for those tasks).

Something like Time Machine could be cobbled together using existing tools, but it would involve a FUSE filesystem which is not efficient for normal PC operations. It could be done better.

tasket commented 7 years ago

Here is a list of backup tools that may help you with your own list: https://github.com/restic/others

And another: http://changelog.complete.org/archives/9353-roundup-of-remote-encrypted-deduplicated-backups-in-linux

The ones that have caught my attention for possible use on Qubes are restic, Zbackup, ddar and bup. But they all have significant trade-offs.

v6ak commented 7 years ago

Hello, Thank you for the list and for th comments.

To me, the selection of a backup tool running in dom0 comes down to three criteria:

  1. Does it need interactive access to the destination media/filesystem? If so, it cannot be used.

It depends what you mean by filesystem. Some backends (e.g., Duplicati and Duplicity) have a rather a limited set of basic operations (list, get, put, delete). In my QubesInterVMBackend for Duplicity, I allow only limited set of characters in order to mitigate attacks by malformed filenames. If the files are properly authenticated and properly checked (which is something I would want anyway for obvious reasons), I don't see any problem here. If they aren't properly checked, then we have troubles when restoring.

Especially if we implement Merkle-tree-based authentication (which is something I want anyway), there is virtually no attack surface. Well, attacker that controls the storage or BackupStorageVM could still interrupt the restore process or remove backups, but not anything worse.

  1. Does it scan all data to find deltas? This is what most tools do, and its not terrible in usual practice because they can skip many small files according to modification date. But in dom0, nearly all our data is a handful of huge image files, so mod date becomes too coarse an indicator to be helpful in a majority of use cases. This is not a deal-breaker, but suffering it means the only efficiency gain we can anticipate is in storage space.

Good point. However, I am not planning to backup VMs from dom0 on this level.

  1. Does the storage format allow old backups to be pruned arbitrarily by date, without compromising the integrity of the backup set? If not, the storage efficiency over non-incremental backups like qvm-backup will be marginal.

Also good point. But I don't think it is marginal advantage:

Factors that cannot be traded:

  • Dom0 isolation

I agree. Even now, I am trying to aggressively sanitize data not only for dom0. I want to use acsii over utf-8 where possible, limit unlimited-size buffers etc.

  • Encryption and verification layer

I agree. This however does not imply we need this layer in the backup software.

A person keeping hundreds of GB of static reference data should not suffer a large penalty when they modify a small detail in a large disk image. But I came to a conclusion that an optimal tool cannot be selected for this use case, because it doesn't exist. What's missing is a runtime (dom0) environment that efficiently flags deltas, like an archive-bit at the block level.

Again, I run the file-based backup for the VMs and I plan to run it also for dom0, rather for homedir than for VM images.

Something like Time Machine could be cobbled together using existing tools, but it would involve a FUSE filesystem which is not efficient for normal PC operations. It could be done better.

Sure. But you can opt to use it just for some VMs. When I implement using multiple backend, you will be able to pick various tradeoffs for various VMs.

-- Sent from my BlackBerry Android device with K-9 Mail. Please excuse my brevity.

tasket commented 7 years ago

Hello, Thank you for the list and for th comments. To me, the selection of a backup tool running in dom0 comes down to three criteria:

  1. Does it need interactive access to the destination media/filesystem? If so, it cannot be used. It depends what you mean by filesystem. Some backends (e.g., Duplicati and Duplicity) have a rather a limited set of basic operations (list, get, put, delete). In my QubesInterVMBackend for Duplicity, I allow only limited set of characters in order to mitigate attacks by malformed filenames. If the files are properly authenticated and properly checked (which is something I would want anyway for obvious reasons), I don't see any problem here. If they aren't properly checked, then we have troubles when restoring.

Storage complexity is part of the risk, yes, but so is the type of transaction... which sounds interactive (not push) in this case.

I believe best practice here would be to follow qvm-backup's example and have dom0 push data and commands to a backupVM, one-way. Only exception would be reception of short status codes (success/fail) and non-parsed feedback one typically sees with qvm-run.

As I mentioned earlier, I believe the push model is possible with incremental backup tools, just not all of them.

Especially if we implement Merkle-tree-based authentication (which is something I want anyway), there is virtually no attack surface. Well, attacker that controls the storage or BackupStorageVM could still interrupt the restore process or remove backups, but not anything worse.

Sounds interesting.

  1. Does it scan all data to find deltas? This is what most tools do, and its not terrible in usual practice because they can skip many small files according to modification date. But in dom0, nearly all our data is a handful of huge image files, so mod date becomes too coarse an indicator to be helpful in a majority of use cases. This is not a deal-breaker, but suffering it means the only efficiency gain we can anticipate is in storage space. Good point. However, I am not planning to backup VMs from dom0 on this level.

Ah. Readme was not terribly clear on that point, so I assumed backup was handling image files directly.

Note this is still an efficiency issue for users with large files: databases, video footage, etc.

  1. Does the storage format allow old backups to be pruned arbitrarily by date, without compromising the integrity of the backup set? If not, the storage efficiency over non-incremental backups like qvm-backup will be marginal. Also good point. But I don't think it is marginal advantage:
  • Imagine you perform full backup once per three months and incremental backup once per week (or even more often). It the weekly backup is going to be tiny compared to the full backup. (Depends on how fast are the data changing.)

This leads to nasty surprises, however, when space is not monitored carefully and one cannot quite fit an incremental session on backup media... if you 'prune', you may be faced with erasing a full backup--and then performing one--or at least erasing days worth of incremental data before the current state can be backed-up via a larger/longer session. These are bad choices to give the user.

A true pruning capability means the size of the current backup session won't increase when backup disk space has to be recovered. And it means recent backups will likely be preserved when space becomes tight..... but in such a situation the user can choose any sessions for deletion with no additional impacts (user has to think only about which dates are no longer valuable, or allow backup tool to automatically remove oldest, etc.).

  • File-based don't backup free space and can exclude ~/.cache etc., so even full backup is more tiny.

Even qvm-backup skips unallocated blocks. But skipping .cache is a good point; I wish Qubes treated it as a separate volume.

Factors that cannot be traded:

  • Dom0 isolation I agree. Even now, I am trying to aggressively sanitize data not only for dom0. I want to use acsii over utf-8 where possible, limit unlimited-size buffers etc.

If there is complex structure/meaning in the ascii, it can be a hazard anyway... word of caution.

Something like Time Machine could be cobbled together using existing tools, but it would involve a FUSE filesystem which is not efficient for normal PC operations. It could be done better. Sure. But you can opt to use it just for some VMs. When I implement using multiple backend, you will be able to pick various tradeoffs for various VMs.

That would be nice. (Sorry for the flattened reply structure... I used GH quoting function.)

v6ak commented 7 years ago

Storage complexity is part of the risk, yes, but so is the type of transaction... which sounds interactive (not push) in this case.

I believe best practice here would be to follow qvm-backup's example and have dom0 push data and commands to a backupVM, one-way. Only exception would be reception of short status codes (success/fail) and non-parsed feedback one typically sees with qvm-run.

What do you see wrong with the interactive approach if it is properly authenticated? Maybe it leaks some data access patterns, but I don't think this is significant. Well, I understand that this is some nice property, but it is rather nice-to-have.

Duplicity seems to behave somewhat this way, but I am not 100 % sure. It might interactively check if the metadata are the same and optionally download them. It uses some cache for metadata. Of course, it does not work this way in DVMs, because ~/.cache is lost when DVM shuts down…

Especially if we implement Merkle-tree-based authentication (which is something I want anyway), there is virtually no attack surface. Well, attacker that controls the storage or BackupStorageVM could still interrupt the restore process or remove backups, but not anything worse.

Sounds interesting.

More details: https://github.com/v6ak/qubes-incremental-backup-poc/issues/37

Ah. Readme was not terribly clear on that point, so I assumed backup was handling image files directly.

Thank you for the feedback.

Note this is still an efficiency issue for users with large files: databases, video footage, etc.

Sure.

  1. Does the storage format allow old backups to be pruned arbitrarily by date, without compromising the integrity of the backup set? If not, the storage efficiency over non-incremental backups like qvm-backup will be marginal. Also good point. But I don't think it is marginal advantage:
  • Imagine you perform full backup once per three months and incremental backup once per week (or even more often). It the weekly backup is going to be tiny compared to the full backup. (Depends on how fast are the data changing.)

This leads to nasty surprises, however, when space is not monitored carefully and one cannot quite fit an incremental session on backup media... if you 'prune', you may be faced with erasing a full backup--and then performing one--or at least erasing days worth of incremental data before the current state can be backed-up via a larger/longer session. These are bad choices to give the user.

A true pruning capability means the size of the current backup session won't increase when backup disk space has to be recovered. And it means recent backups will likely be preserved when space becomes tight..... but in such a situation the user can choose any sessions for deletion with no additional impacts (user has to think only about which dates are no longer valuable, or allow backup tool to automatically remove oldest, etc.).

So, you are suggesting a decremental backup instead of incremental, right? It might be a good idea, but I feel this will come at some cost, like higher bandwidth and backup time.

  • File-based don't backup free space and can exclude ~/.cache etc., so even full backup is more tiny.

Even qvm-backup skips unallocated blocks. But skipping .cache is a good point; I wish Qubes treated it as a separate volume.

Unallocated blocks are not equal to free space. This holds if TRIM is used, which is not always the case. (And with fstrim, the TRIM might be delayed.)

Well, ~/.cache is not the only directory to skip. In Maven/Gradle/Sbt projects, you might want to skip “target” directory, which is harder to link to a separate directory automatically. And having a separate directory for ~/.cache and similar data has some demands on the space management – you have to specify sizes of each separately. I remember having this for somewhat different purpose (larger writeback for performance and lower SSD wear) and I don't want to manage this for each VM separately. I am not sure how much Qubes tries to target non-tech users, but this would not be a good step for them.

Factors that cannot be traded:

  • Dom0 isolation I agree. Even now, I am trying to aggressively sanitize data not only for dom0. I want to use acsii over utf-8 where possible, limit unlimited-size buffers etc.

If there is complex structure/meaning in the ascii, it can be a hazard anyway... word of caution.

Sure. Or with … | bash -, even ascii can do serious harm. Using ascii over utf-8 was just an example how careful I am, not ultimate guide for security ☺

-- Sent from my BlackBerry Android device with K-9 Mail. Please excuse my brevity.

v6ak commented 7 years ago

I have tried Borg. It looks like deduplicated and compressed full FS snapshots. Like Merkle tree, but DAG instead of tree. It seems to support prune well. So far, cool.

Deduplication is performed on chunks smaller than one file. However, the number of files stored in the repo/data (so-called “segments”) seems to be significantly lower than number of unique chunks (or even than number of source files). It seems that every segment is created once and then never updated.

If prune deletes reference to a directory node and performs GC, then it might need to reupload and reorganize many segments, just because of removing one chunk from the segment. It definitely does not come for free.

When looking at backup of my /usr, those files vary between 2.5MiB and 13MiB. (Hope it is not a viable side channel…) The official documentation mentions 5MiB file size. When backing up 100GiB, this seems to result in roughly 20K files, which is not a small number, but it might be acceptable. If Borg used larger segments, then more reorganization would be needed.

The challenge for Borg comes with using a custom storage backend – we want neither file-based storage nor SSH-based transmission. I see two/three ways there:

a. Create some server that proxies between Borg RPC and BackupStorageVM. This would require to implement 18 methods of Borg RPC and to listen SSH on loopback. Sound insane. b. Create FUSE-based filesystem that does the same. Maybe easier, maybe more hacky, but probably more universal, because any backup software that can backup to filesystem would be able to use it. c. Patch Borg. I don't see much advantages over option a.

A note on running file-based backup on img files in dom0: After all, maybe this is not a bad idea for some special cases, though it is something different than I originally planned. A pitfall: The backup backend would have to be able to treat block devices as files, as Qubes 4 switches to LVM. Another pitfall: I would not recommend doing this on a running VM without cloning the volume.

v6ak commented 7 years ago

We can hardly add additional backends before #37. Well, we theoretically could, but we would have to, for example, authenticate the backend name, which is not so easy.

I am not against discussing it now. I am just explaining what we are waiting for.

Currently, Borg looks well (but integration with storage backend would be a bit painful), Duplicati and Restic seem to be worth trying.