redmop / sanoid

Policy-driven snapshot management and replication tools. Currently using ZFS for underlying next-gen storage, with explicit plans to support btrfs when btrfs becomes more reliable. Primarily intended for Linux, but BSD use is supported and reasonably frequently tested.
http://www.openoid.net/products/
GNU General Public License v3.0
2 stars 0 forks source link

Sanoid and Synoid - Snapshots taken by this script are non-atomic #17

Open redmop opened 8 years ago

redmop commented 8 years ago
The problem:

Snapshots are not taken atomicly. Each object in the config file is taking individually.

Example:

DatasetA has a database DatasetB has a file repository

A snapshot of DatasetA is taken Both datasets are updated (database update, file added to repository) A snapshot of DatasetB is taken Your datasets are now out of sync. The database update is not saved, so the file added to the repo is orphaned.

Example 2:

DatasetA has a database DatasetB has a file repository

A snapshot of DatasetB is taken Both datasets are updated (database update, file added to repository) A snapshot of DatasetA is taken Your datasets are now out of sync. The database update is saved, but the file repo update is not, leaving orphaned data in the database. This is worse than the above.

Possible fix:

Take all the snapshots in a single command. As all the snapshots are pushed to an array n sanoid, this should need only minor changes near system($zfs, "snapshot", "$snap"); This is also needs fixing in syncoid, something around my $snapcmd = "$rhost $mysudocmd $zfscmd snapshot $fs\@$snapname\n"; This should not change expiring.

redmop commented 8 years ago

The above fix may bump up against the max shell length for many snapshots with long names. We have 2097152 characters to work with on linux according to getconf ARG_MAX

Sanoid fix:

Change the groupings in the config file to be logical groupings, and have a datasets entry specified as an option.

[VM_100]
    # space separated to make it easier to create command lines,
    datasets = "rpool/vm-100-disk-1 rpool/vm-100-disk-2"
    # or some other separator like ',' to handle datasets with spaces in the name
    datasets = "rpool/vm-100-disk-1,rpool/vm-100-disk-2"
    use_template = production
    recursive = yes
    process_children_only = yes
Syncoid fix:

Use recursive snapshots, and send/recv the whole thing all at once.