psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
580 stars 62 forks source link

zfs-rsync command? #114

Open psy0rz opened 2 years ago

psy0rz commented 2 years ago

There are some requests for making zfs-autobackup behave more like an rsync, and i agree. However i think it will be another command, which offcourse uses the same codebase but has some different parameters and features.

Please help in discussing what such a tool should and shouldn't do.

To compare, first we start with the features of zfs-autobackup with its default settings:

A tool like zfs-rsync, should perhaps have these features with its default settings:

More details and optional stuff for zfs-rsync:

(regarding #41 and #113 )

Scrin commented 2 years ago

I would love to have a "more like rsync" tool for migration purposes in addition to the zfs_autobackup for backups. Currently I have my own script for such cases, but a "properly maintained" solution would definitely be better. To me the above looks good, with a few comments/opinions/ideas:

psy0rz commented 2 years ago

thanks for the reply! i agree with your points, i think they already fall somewhat on the above mentioned extra options:

but indeed, sending the current data over should be easy and not require too many options or thinking

digitalsignalperson commented 2 years ago

I'd be interested in making it as close to mirroring everything as possible:

One hack I've been considering for achieving this to mirror a pool to rotating offsite drives is

This gives you robust (at least from no external tools to maintain) and exact mirror of the pool, but the con is you have a permanent warning about being degraded state. Reference: https://serverfault.com/a/641217

additional aside...almost wonder if you could do the offline/online sync to remote servers as well using a network block device or similar https://unix.stackexchange.com/questions/119364/how-can-i-mount-a-block-device-from-one-computer-to-another-via-the-network-as-a

psy0rz commented 2 years ago

ha! did i read that idea of changing disk on reddit r/zfs? i wanted to mention zfs-autobackup but i dont want to be too spammy with it :)

psy0rz commented 2 years ago

indeed, i forgot to menation the --delete or --destroy option that will indeed delete missing stuff. again, just like rsync, which is one of my all time favorite cli tools :)

psy0rz commented 2 years ago

i also agree it should be fairly easy to use it to make a close-as-possible mirror.

you could go one step (option) further and have it sync over changed properties as well. (not just the first time, like zfs-autobackup does)

digitalsignalperson commented 2 years ago

I don't think I saw the reddit thread, have a link? So I can learn all the grave warnings for why to not do it 😅 The person on server fault said

Just a quick update: over the past year this approach has worked well enough. Monthly restore tests of the offsite backup have been successful and consistent. Rotating an array (rather than a single disk) would be better to provide a level of redundancy in the offsite copy, and I would recommend doing that if possible. Overall this is still a hackish approach and does introduce some risk, but has provided a reasonably safe and inexpensive offsite backup of our data.

psy0rz commented 2 years ago

sorry cant find it anymore :( but if you google you can find more persons that tried to do this. i think its fairely ok, but a zfs-rsync would indeed allow you do to it offsite.

digitalsignalperson commented 2 years ago

any thoughts on bi-directional-ish zfs-rsync? ...zfs-syncthing?

Tangentially on topic... I'm currently testing on 3 VMs simulating 1 server, 2 workstations (root@golden-image1, root@golden-image2)

Loop this on server

# Push 1
zfs-autobackup -v \
    --ssh-target root@golden-image1 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Push 2
zfs-autobackup -v \
    --ssh-target root@golden-image2 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Pull 1
zfs-autobackup -v \
    --ssh-source root@golden-image1 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Pull 2
zfs-autobackup -v \
    --ssh-source root@golden-image2 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

Thinning schedule just for testing purposes.

Dumbly pushing and pulling data from the workstations. Scenario of me as a single user having multiple PCs and laptops, keeping some datasets in sync.

e.g.

rpoolx/DATA/media
rpoolx/DATA/email
rpoolx/DATA/workdata1
rpoolx/DATA/workdata2

Timing it. When no changes it takes some time to analyze and exit. (aside: curious what causes analysis to be "slow")

    Push 1  7.8s
    Push 2  7.9s
    Pull 1  7.0s
    Pull 2  7.1s

    Total 29.8s

I can make a change to any dataset on any workstation and it takes minimum ~30sec to make it around everywhere (or sometimes it has to loop twice to deal with thinning). The "min-change 1" is key to this working. Could probably make the (push, pull) per workstation in parallel to make it tight and not grow with increasing nodes.

Not seeing any conflicts because only editing files on one workstation at a time. Even with the datasets mounted in all places (atime=false in case concurrent browsing causes a change). Can do things like take a laptop offline offsite and automatically push back changes later. If I intentionally make a conflict writing to two workstations within 1 minute, nothing explodes, I see the error and just rollback and/or copy the change manually. In practice I might assume responsibility for only one workstation "checking out" the dataset at a time with a read/write mount.

Could be some potential to automatically detect conflicts, rollback/resolve, and rsync conflicting changes to the server. Or maybe more robust and seamless a fuse driver and locking mechanism to ensure consistent replication always... z(c)luster??

digitalsignalperson commented 10 months ago

From my comment here https://github.com/psy0rz/zfs_autobackup/issues/113#issuecomment-1793903705 something looking closer to rsync for zfs?