oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
604 stars 136 forks source link

The `mbuffer` settings relate to the remote system only, is this right? #629

Closed jimklimov closed 5 months ago

jimklimov commented 5 months ago

At least, this is what I see in practice, per znapzend logs (wrapped for readability) e.g.:

# zfs send -Lce -I 'rpool/home/abuild@znapzend-auto-2024-01-16T00:00:00Z' \
    'rpool/home/abuild@znapzend-auto-2024-01-16T11:51:05Z'\
    |ssh -o batchMode=yes -o ConnectTimeout=30 znapzend \
        'mbuffer -q -s 256k -W 600 -m 128M\
        |zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild'

...although documentation examples (in znapzendzetup embedded man page) seem to imply that this is (was originally?) about the sender's local mbuffer:

On one hand, having it remote-only adds constraints on present software (and run-time resources like RAM) of the destination host(s).

On another, if the main goal of mbuffer is to level out the burstiness of original ZFS send stream generation (and/or, to an extent, of its consumption on the other side) - so sender is not always blocked on the receiver and vice versa - then the mbuffer may as well run on the source system (assuming network speed roughly a constant).

Running the buffer on sender also allows for a more predictable use of RAM (sender may control how many streams it is sending and how large their buffers are sized, but may not control how many different systems are currently backing up into the same destination server and the impact of the many buffers spawned only there).

In fact, with manual replications I often end up having both (to level out network lags): zfs send | mbuffer | ssh "mbuffer | zfs recv"

This issue is posted to begin a discussion about perhaps adding another group of settings (src_mbuffer and src_mbuffer_size?) to optionally use that instead of (or in addition to) an mbuffer on the destination system.

Technically, it could be more correct to track independent dst_N_mbuffer(_size) settings and keep the current one for source, but this might break some deployments upon upgrade?..

Finally note that there may be local destinations, and running two mbuffer's talking to each other on the same host is an overkill. Although... if the user's znapzendzetup calls for it? Maybe warn, but honour their choice.

jimklimov commented 5 months ago

At least, can confirm the observed (may be not "desired") behavior in codebase.

And per git blame, this remote-ness of mbuffer goes from the first commits (v0.0.1): https://github.com/oetiker/znapzend/blob/16467ee623bae2fbd373dc7b38d7918992e38114/lib/ZnapZend/ZFS.pm#L60-L76 and explicit "check if executable is available on remote host" at https://github.com/oetiker/znapzend/blob/16467ee623bae2fbd373dc7b38d7918992e38114/lib/ZnapZend/Config.pm#L125-L130

So to minimize surprises in the field, any change here should honour that singular setting of mbuffer path name (used for each destination and for sender in port-to-port mode), unless overridden by newly defined src_* and dst_N_* variants, and documented as deprecated...