hunleyd / btrfs-auto-snapshot

BTRFS Automatic Snapshot Service for Linux
GNU General Public License v2.0
16 stars 4 forks source link

How to deal with multiple snapshots per second? #10

Open ams-tschoening opened 2 years ago

ams-tschoening commented 2 years ago

I came across this in the past already: Snapshots are created with a resolution of a minute, while especially during tests one might need to create snaps far more often. Waiting a minute during tests is cumbersome. Though, when not doing so with the original code, it seems that different problems might happen. In my older Ubuntu 18.03 I got error messages regarding read-only file systems and the snapshots simply wasn't created. While the error message was a bit difficult to understand without any useful context in the error message, it didn't really harm.

When implementing #9 OTOH, I have the feeling that behaviour of BTRFS tools changed in those cases. Whenever a snapshot target directory is already available, it seems like BTRFS simply adds the directory name of the snapshot source as additional child directory. This makes creating the snapshot complete, but is a problem when snapshotting / itself, as that isn't a compatible child name. This leads to the following error message:

localhost:/usr/local/btrfs-auto-snapshot # ./btrfs-auto-snapshot --keep=50 //
localhost:/usr/local/btrfs-auto-snapshot # ./btrfs-auto-snapshot --keep=2 //
ERROR: invalid snapshot name '/'
localhost:/usr/local/btrfs-auto-snapshot # ./btrfs-auto-snapshot --keep=20 //

But even if things succeed the automatically created directory makes the formerly available snapshots directory not being a snapshots directory only anymore, but contain content. This makes deleting the snapshot directory as snapshot fail and instead one needs to delete the content first, empty the directory that way and afterwards can delete the snapshot itself as usual. Otherwise the following error is printed:

localhost:/usr/local/btrfs-auto-snapshot # ./btrfs-auto-snapshot --keep=5 //
ERROR: Could not destroy subvolume/snapshot: Directory not empty
ERROR: Could not destroy subvolume/snapshot: Directory not empty
[...]

Though, according to my tests it seems that deleting snapshots with rm -r * works as well now. Did that on the problematic snapshots and the additional content and the snap dirs themself were properly removed. And really removed, not only "hidden" or alike, as btrfs subvolume list / didn't show them anymore.

I've implemented a workaround simply based on deleting a possibly already existing snapshots and creating it afterwards again. The argument simply is that with the current limitation in place, it makes test easier and if one creates new snapshots, user's interest is to have the most current per minute instead of the oldest one. That would especially defeat tests as simply nothing would be done in case of existing snaps per minute, while creating them might the test itself.

log notice "$( ${dry_run} btrfs subvolume delete -c "${snap_path}" 2> '/dev/null' )"
log notice "$( ${dry_run} btrfs subvolume snapshot   ${snap_opts} )"

So, we have the following options in the long term:

  1. keep things as they are: delete+create
  2. look for existing snaps and don't create new ones.
  3. add seconds to the directory names, which is enough most likely.
ams-tschoening commented 2 years ago

This has nothing to do with changes in BTRFS tools, I simply introduced a spelling error when checking command line arguments. This made default read-only snaps writable and hence additional subdirs could be created within existing ones. I fixed that, but the underlying root cause of multiple snaps per minute stays.

hunleyd commented 2 years ago

add seconds to the directory names.

I would not be opposed to this.

what do you think @mwt

ams-tschoening commented 1 year ago
  1. keep things as they are: delete+create

We have daylight saving time in Germany and yesterday night the clock has been changed back from 3 to 2 o'Clock. This resulted in CRON firing the events to create snapshots with the same time twice and at my systems I didn't had the workaround to delete and create in place yet. Therefore I received error messages like the following, because snaps existed already:

ERROR: invalid snapshot name '/'

Even with having snap names containing seconds the same problem would have happened, because CRON would fire most likely at the exact same second and creating the snaps is fast within the same second as well. The only way to prevent this is either delete+create like implemented now or not creating new snaps at all. Though, while the latter might make sense in case of tests within the same second, it doesn't over an hour because of time switches, as one would lose an hour of newly created snaps.

So, in my opinion delete+create should simply be kept anyway and the only discussion left is about additionally adding seconds or not.