Closed lotheac closed 4 years ago
The reason we are using -I is that in this way znapzend will pass on any intermediate snapshots it may have missed due to some network outage or because it took too long transfering a particularly big snapshot ...
to fight 'leftover' snapshots the remote cleanup routine would have to be enhanced ... this would be a worthwhile thing by all means !
On Fri, Nov 15 2019 07:20:34 -0800, Tobias Oetiker wrote:
The reason we are using -I is that in this way znapzend will pass on any intermediate snapshots it may have missed due to some network outage or because it took too long transfering a particularly big snapshot ...
In my opinion, if it's important for znapzend to avoid "gaps" like this, it should instead try to send each missing snapshot (according to DST retention policy) to the destination. In a policy with, for example, 1d=>1h on SRC and 1month=>1d on DST, after such an outage of say, six hours, sending five intermediate hourlies just to destroy them afterwards doesn't strike me as particularly productive.
(BTW, we were also kind of surprised by the fact that znapzend sends to DST every hour in that scenario, as opposed to once a day. But that's a separate thing and I do kind of understand the reason there anyway.)
to fight 'leftover' snapshots the remote cleanup routine would have to be enhanced ... this would be a worthwhile thing by all means !
I think a safer position to take would be that znapzend does not send, receive, or destroy any snapshots that it has not taken or sent itself, so your proposal seems to me a bit dangerous :)
-- Lauri Tirkkonen | lotheac @ IRCnet
I think the issue is not brought up for the first time here :)
From my PoV, sending such manual intermediate snapshots is a good thing, though I agree that the stance is site- (and admin-)dependent so making this optional could be worthwhile. For me this is good because just today I had a server hicced up with too many old snapshots collected (destination problem). The fast way out was to make the manual snapshots, have znapzend do its magic on the larger sub-trees (much I/O so gigabytes of snaps for just megabytes of "live" data at any time), and then I knew that snapshots older than this manual one are safe to delete from the origin system so it's ZFS is no longer collapsing due to low free space and its fragmentation. Having the full znapzend run takes a large part of the day, if not several, on that box and its backup link, and we needed it back to life and service ASAP, so picking at the worst offenders quickly was worthwhile.
I agree this is not too common, but also not an excluded variant. Making it optional and non-default (e.g. part of runonce
handling) is an option :)
On the technical side, I believe a zfs send -I | zfs recv...
passing a queue of intermediate snaps can be orders of magnitude faster than a loop of truly incremental steps of one snapshot each. Both can be slower, however, than just sending the increments (maybe skipping some original snaps) that are relevant for the destination's retention policy. E.g. if you keep hourly snaps on origin and daily on backup, it is messy to send all the hourlies of recent day to backup and then remove 23 of them there. Looking at run logs, I feel this is the logic that happens today, but am not certain.
On Thu, Nov 21 2019 05:54:40 -0800, Jim Klimov wrote:
On the technical side, I believe a
zfs send -I | zfs recv...
passing a queue of intermediate snaps can be orders of magnitude faster than a loop of truly incremental steps of one snapshot each.
of course it is, but we are talking about an error recovery scenario here, not normal operation. normally znapzend sends to DST as often as SRC is snapshotted.
Both can be slower, however, than just sending the increments (maybe skipping some original snaps) that are relevant for the destination's retention policy. E.g. if you keep hourly snaps on origin and daily on backup, it is messy to send all the hourlies of recent day to backup and then remove 23 of them there. Looking at run logs, I feel this is the logic that happens today, but am not certain.
yes, if you take hourly snaps on SRC but want to retain 1 daily snap on DST, znapzend sends each hourly to DST on the hour and removes the previous hourly.
-- Lauri Tirkkonen | lotheac @ IRCnet
Thanks @lotheac for the good points.
I proposed a compromise at https://github.com/lotheac/znapzend/pull/1 to have both camps satisfied, and aware of pitfalls (or benefits... POV-dependent...) with zfs send -I
as well ;)
Dug a bit in the history (was interested if this was something I broke, or was some recent surprise...) and found that the big -I
was there from the beginning:
Presumably this PR gets superseded by #459 ;)
Hi,
it seems znapzend is using
zfs send -I
to send snapshots to destination. We found this to be surprising, since it includes intermediate snapshots (in our setup we utilise snapshots taken on the source for other purposes than backup, eg. synchronizing most current state to other production nodes). Consider the following setup:... wait for znapzend to operate the first time ...
then take a snapshot manually:
What we expected to happen here is that the manually taken snapshot is not sent to DST, but because of
-I
, it actually is. But there is nothing that would ever clean it up from there (even if it was eventually destroyed on source). But-I
happily sends all the intermediate snapshots too somanual-snapshot
also ends up on DST:I cannot see a reason to not use
-i
instead, to make sure that znapzend does not put snapshots it will never destroy onto DST. So here's a diff to avoid intermediate snapshots from ending up on DST, by changingsend -I
tosend -i
.