psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
601 stars 63 forks source link

Feature requset: non-ZFS-backed storage as target for backups. #182

Closed blacklion closed 1 year ago

blacklion commented 1 year ago

Many hosting providers rents generic disk space for cheap (Hetzner's StorageBox, for example) where you have access to some storage space via WebDAV, scp, FTP, NFS, etc (but not ssh), this storage space is not ZFS-backed and not root- or even shell-accessible.

It could be nice to use this tool with such storage space as target for backups.

Store snapshots as simple files, use sftp/scp/local file access to store them, "ls" to list them and "rm" to remove them, etc.

Scrin commented 1 year ago

[edit: this was a reply to a comment explaining that this would not be possible, but that comment was deleted]

Not entirely correct, you can absolutely zfs send/receive to/from plain files residing on any filesystem, on any media. This is including delta transfers; you can snapshot a 30TB dataset, zfs send it to a file, change a few hundred MB and zfs send an incremental delta into a few hundred MB file. Likewise you can then zfs receive these "files" back to a ZFS filesystem and you'll have the datasets and snapshots like you'd have if you directly sent them. This is actually how I do some of my offsite backups; I do zfs sends to files that I upload to aws s3 deep glacier; full snapshots every year or two roughly, and incremental deltas in between.

However there are some caveats, such as the lack of thinning on the target if the target consists of files instead of a zfs filesystem (you can't just delete an incremental snapshot file from between), and thus I don't know whether it's worth the extra complexity to implement such feature to zfs_autobackup, but I'm not the person to make that decision.

blacklion commented 1 year ago

I'm using ZFS more than 10 years now and I understand what is ZFS and what is snapshot very well.

You could do (and I've used it multtude of times):

zfs send zroot@snapshot | ssh user@some.other.host 'cat > my.zfs.snapshot'

And later you could use this snapshot to restore ZFS:

ssh user@some.other.host 'cat my.zfs.snapshot' | zfs receive newzroot

(I've omitted obvious compression and encryption from pipes for brevity).

You could store & use incremental snapshots in same way, no problem. And all this " freeze this virtual machine hosting the database for a second and replicate the few GB to another ZFS-Storage." will work without "another ZFS-Storage", with "another other storage" as well. It will not be "hot backup", but it will be very useful backup.

And, no, I don't store VM disks on my ZFSes, I'm using it as normal filesystems for system & user files, instead of UFS2 or ext4 (root-on-ZFS).

Simon4711 commented 1 year ago

o.k., sorry guys.

blacklion commented 1 year ago

However there are some caveats, such as the lack of thinning on the target if the target consists of files instead of a zfs filesystem (you can't just delete an incremental snapshot file from between), and thus I don't know whether it's worth the extra complexity to implement such feature to zfs_autobackup, but I'm not the person to make that decision.

You could store "yearly (full) - monthly (incremental to yearly) - weekly (incremental to monthly) - daily (incremental to weekly) - hourly (incremental to daily)" chain of snapshots (in files), and remove all "previous XXX" snapshots, which needs some automatic management.

blacklion commented 1 year ago

Problem is, ZFS storage is expensive, file storage is cheap, and zfs_autobackup looks best among all other solutions for scheduled, hierarchical ZFS snapshotting. There are other tools, like zfsnap and such, but zfs_autobackup looks as most complete, robust and mature among them.

But requirement to have fully-featured ZFS system with fully-featured shell on "other end" is very constraining.

Feature to have different "backends" looks very useful.

Scrin commented 1 year ago

You could store "yearly (full) - monthly (incremental to yearly) - weekly (incremental to monthly) - daily (incremental to weekly) - hourly (incremental to daily)" chain of snapshots (in files), and remove all "previous XXX" snapshots, which needs some automatic management.

That would work work, however that is a little different from the "normal" thinner behavior, and the space overhead would be a bit bigger, but still smaller than storing just full snapshots. Another alternative would be "redoing" the affected snapshots while thinning, if the said snapshots still exist on the source, but this would then incur in additional bandwidth requirements for the transfer, and keeping the snapshots on the source for longer so it's not perfect either

Problem is, ZFS storage is expensive, file storage is cheap, and zfs_autobackup looks best among all other solutions for scheduled, hierarchical ZFS snapshotting. There are other tools, like zfsnap and such, but zfs_autobackup looks as most complete, robust and mature among them.

To this I fully agree, and my current setup is somewhat based on this; I use zfs_autobackup to back up a group of hosts to a "backupper host" (running "a full ZFS" obviously), and then a custom solution to zfs send snapshots from there to files and uploading to aws deep glacier among other places for more long term storage, since storage space there is dirt cheap there and "thinning" is not necessarily even worth the effort. I would love to replace this part of the custom solution with zfs_autobackup as well if such feature were to come available, like I originally replaced my custom zfs backup solution (the first half of my "backup chain") with zfs_autobackup

Simon4711 commented 1 year ago

But the differential space and bandwith saving solution is only possible with zfs to zfs, or am I wrong? If I sync a zfs-snapshot of a 30 TB dataset to a file on ext4 by ssh, it will always be a 30 TB sync and it will always uses 30 TB of Space. So 2 backups are 60 TB, 3 are 90 TB an so on. So zfs-autobackup is based on the replication of snapshots from zfs to zfs. Another problem could be, loosing the zfs resume function in send | receive, one of the greatest advantages of zfs-autobackup to continue the transfer from zfs to zfs, and not to do another full sync after a crash of machine or internet.

Scrin commented 1 year ago

The size of the file will be the size of the stream, which is the size of the net delta zfs has to send. (a "full send" is a delta from nothing, and thus roughly equals to the size of the dataset).

So for example lets say you have a 30TB dataset, you snapshot it as snap1 and then do zfs send dataset@snap1 > dataset-full-snap1.zfs, this results in dataset-full-snap1.zfs being roughly 30TB. Next you change, say, 1TB of that dataset, and snapshot it as snap2, and now if you do zfs send -i dataset@snap1 dataset@snap2 > dataset-incremental-snap1-snap2.zfs, the resulting dataset-incremental-snap1-snap2.zfs file is roughly 1TB in size.

However in order to zfs receive the dataset-incremental-snap1-snap2.zfs, you need to have snap1 already present so for backup purposes you have to keep both of these files even if you don't "need" snap1 anymore after you took snap2, hence why you can't easily use the thinner in its current form if the target consists of files rather than a "full zfs filesystem"

Simon4711 commented 1 year ago

ah, o.k. thanks for my new knowledge to just send an incremental snap using -i to any filesystem. But useless for me to hold every snapshot in incremental backups, to be able to restore the latest one. And I also would not thrust the missing zfs checksums in receiving the snapshot data in a simple filesystem. It should all be o.k., but the backup can also be rubbish after 60 days incremental, and the 2nd snapshot-file has a failure synced to ext4 filesystem.

Scrin commented 1 year ago

It doesn't have to be ext4 filesystem, it might very well even actually be ZFS but the storage provider just doesn't expose that, instead they expose a different interface, for example (aws) S3 in my case, but this is getting rather offtopic to the original issue.

If you want to continue the general discussion regarding the benefits, drawbacks and caveats of keeping zfs snapshots in regular files, I'd suggest you start a new discussion in the Discussions section.

psy0rz commented 1 year ago

This feature would be nice to have, but would be lot of work. I'll leave the request open, but its probably not going to happen any time soon.

digitalsignalperson commented 1 year ago

I've had success with zfsbackup-go saving and restoring zfs to S3. I was using it to do a full replication every month, and incremental every day in between. Maybe some ideas from that project can be of use here

sfatula commented 1 year ago

A few more caveats that I believe are true. I don't believe the format of the snapshots are guaranteed to be compatible between versions of zfs, at least major ones. When you are doing your "backups", zfs recv does some error checking to make sure there is no corrupted data. WIthout using it, you would not have that so you'd need to do some sort of checksumming. If you had to restore 50 incrementals previous, kind of a pain to do so but I guess if you have something managing it, it might be ok. And the obvious I guess. If you are just storing the plain files on non bitrot protected filesystems, I really wouldn't trust a restore anyway.

psy0rz commented 1 year ago

Yeah, kind of defeats the point of ZFS imho. There are better backup solutions for that, which dont use ZFS at all.