psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
575 stars 62 forks source link

specify source dataset(s) instead of property name #113

Open digitalsignalperson opened 2 years ago

digitalsignalperson commented 2 years ago

I'm currently wondering about the design requiring setting of a autobackup:$name property to select source_dataset

What are the advantages compared to just providing e.g.

or any insight in to the design choice would be curious to hear.

Cons of using property to manage the config:

The code seems like it would be clean to change without any issues (don't see other use of 'property_name'), changing source_datasets = source_node.selected_datasets(property_name=property_name, ... to source_datasets = a list as parsed from commandline argument

possibly related to #41 (rsync for zfs?? zfsync src_pool/data dst_pool/data)

Curious to hear your thoughts, cheers!

Edit: wasn't thinking about snapshots, holds which use self.args.backup_name; that could still be an argument for those naming purposes. Or in my case I'd use --no-snapshot --no-holds

psy0rz commented 2 years ago

I agree, i'm creating a seperate zfs-rsync issue for this.

On what snapshots should it operate? Just the latest common? And if you run it again and there are newer snapshots, should it send increments to the other side as well?

psy0rz commented 2 years ago

please have a look at #114 and comment overthere

Scrin commented 2 years ago

There are definitely pros and cons in both approaches and it highly depends on the context which is better. As for insight on the original design choice, I'm not sure on that one, but for me the ability to define what to backup on the "source system" rather than the "backupper" was the primary reason why I switched my primary backup solution to zfs_autobackup.

In my primary infrastructure design I have a bunch of servers which all contain both "critical" and "non-critical" data (critical being things like databases, non-critical being things like configurations, or data that can be trivially recreated on demand), and these all depend on the services running on each server.

What the zfs_autobackup design allows me to do is to simplify my infrastructure setup and configuration regarding the backups; the setup scripts (ansible mainly) for the services set up the necessary zfs datasets needed by the service, sets their properties (such as tagging the critical datasets for backup) and obviously sets up the services themselves.

This way when a new service is created in my infrastructure setup or an existing one added to a new server, everything that needs to be done can be done only on that server, the backuppers that backs up all the servers don't need to have knowledge "what datasets are important to backup". The only "knowledge" my backuppers need is "which servers and where to backup, and what are the zpool names", thus "config" changes need to be done only to the server that has the data related to the change, be it adding a completely new service or setting up a new instance of a service.

psy0rz commented 2 years ago

@Scrin good point, i should reiterate that more clearly in the documentation. zfs-autobackup makes it so that other tools/admins can select datasets on the sourcesystem, without needed to access the backup server ad all.

digitalsignalperson commented 2 years ago

Thanks for sharing, I can see how the property is useful depending on the scenario. To not make a breaking change, that could stay the default behavior, but have a new optional argument to instead supply a list of sources (or txt or yaml with list of sources)

psy0rz commented 2 years ago

thats true, i could add a --select=... --select-child=... and --select-single=... (non recursive) perhaps

rest of the syntax stays the same and you wont need to set properties. (but still can, and you could use both if you want)

digitalsignalperson commented 9 months ago

Any thoughts on this for a PR? https://github.com/digitalsignalperson/zfs_autobackup/compare/d0b58b98e7971493bac30a85fa4ec3e8e0192878...digitalsignalperson:zfs_autobackup:v3.1.2-hacks

example usage:

zfs-autobackup -v \
    --no-holds \
    --no-thinning \
    --no-snapshot \
    --other-snapshots \
    --min-change 1 \
    --strip-path=1 \
    --clear-mountpoint \
    backupname-does-nothing-here \
    rpool/test-destination \
    rpool/recursive-source-dataset/\* \
    rpool/some-source-dataset \
    rpool/some-other-source-dataset

I went with ignoring trying to select datasets with the BACKUP-NAME property if source paths are specified, but that could still be an option. The BACKUP-NAME param is still used for snapshots and thinning in general, except in this example with --no-snapshot and --no-thinning.

To use as a snapshot tool without specifying a TARGET-PATH, it's a little weird with the order of args. I allowed for "/None" to be used as a target path to solve this, but maybe there's a more sensible way to order the args or add other options.

psy0rz commented 9 months ago

Hmm i'm not sure if i already responded to this somewhere?

I think this solution is too hackish, i would rather see --select-... options for this.

zfs-autobackup -v \
    --no-holds \
    --no-thinning \
    --no-snapshot \
    --other-snapshots \
    --min-change 1 \
    --strip-path=1 \
    --clear-mountpoint \
    --select-recursive=rpool/recursive-source-dataset \
    --select=rpool/some-source-dataset \
    --select=rpool/some-other-source-dataset \
    backupname-does-nothing-here \
    rpool/test-destination 

Have select behave consistent with https://github.com/psy0rz/zfs_autobackup/wiki/Manual#dataset-property

e.g. something like --select, --select-recursive, --select-exclude, --select-child

And perhaps ignore the autobackup property when --select is used or something.

Edwin