psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
596 stars 63 forks source link

Feature Request: Direct mbuffer transfer #15

Closed devopstales closed 1 year ago

devopstales commented 4 years ago

Currently a dataset can be send troth ssh wit mbuffer but mbuffer can listen on a tcp port for data stream. If we send the data directly to this tcp post without the compression and encryption of ssh it is much more faster than then troth ssh. For this solution we need to start the mbuffer on the destination before we can start sending data.

mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF lremote-zfs/my-vm
zfs send local-zfs/my-vm | mbuffer -s 128k -m 1G -O remote-server:9090
psy0rz commented 4 years ago

This is non-trivial to implement in a clean way. Perhaps it would help if you specify a faster cipher like arcfour in ~/.ssh/config. (https://github.com/psy0rz/zfs_autobackup#specifying-ssh-port-or-options)

(If you're a company and want us to implement it, contact us for a quote)

psy0rz commented 3 years ago

this has become more trivial with the latest changes. If any one wants this, vote it up and i might add it.

devopstales commented 3 years ago

+1

digitalsignalperson commented 2 years ago

+1 upvote for this and possibly other transport options. An option for fast, secure transport on 10G+ networks could be

digitalsignalperson commented 2 years ago

hey @psy0rz thoughts on opening this issue back up, or I'd be happy to open a new one for discussion

psy0rz commented 2 years ago

yes, because of the process handling extensions made for zfs-autoverify, this should be doable.

however, are socat and spiped with encryption faster than regular ssh pipes? (i understand direct mbuffer over plain tcp is offcourse)

digitalsignalperson commented 2 years ago

I'll see if I can do some tests on my 10Gbit setup to see how they compare

mbuffer and netcat can also do the same job, I'm not sure which of the two is higher performance. In this syncoid PR there is some back and forth between mbuffer and nc, and there may be some arguments in favor of nc https://github.com/jimsalterjrs/sanoid/pull/513

however, even without using SSL, socat may in fact be faster than netcat. A benchmark here (albeit very old 2008) found socat faster than netcat https://wiki.atlas.aei.uni-hannover.de/ATLAS/ZFSBenchmarkTest

digitalsignalperson commented 2 years ago

either way, regardless of the tool, they would all have similar setups and usages

maybe there is a flexible way to define scripts that can plug in any of the options

psy0rz commented 2 years ago

it might even already be doable via --send-pipe and --recv-pipe, but maybe hackish and problematic with buildup/teardown.

mbuffer is universally supported in operating systems. but we can support multiple tools.

wishdev commented 2 years ago

Just wanted to add that, indeed, --send-pipe and --recv-pipe work for this concept.

I used netcat but the following options worked for me - not an ssh connection in sight for send/recv and no issues with setup or any processes hanging around. Seemed very clean.

--send-pipe "nc server_name 8023"
--recv-pipe "nc -l -p 8023"
psy0rz commented 2 years ago

awesome, thanks for the info!

dberlin commented 2 years ago

One of the tricky parts of this is finding an unused port that is safe to listen on and tell the other machine to connect to. Particularly in a portable way. It's doable, but annoying and possibly slow (IE it degrades to trying to listen on every port starting at until the kernel gives you one).

If that was implemented in zfs-autobackup in python, and made available as a variable to send/recv pipe (IE as $FREEPORT or something), that would make things like nc/mbuffer a lot easier and safer

digitalsignalperson commented 2 years ago

fwiw did some testing on localhost of transport options throughput (not yet piped through zfs_autobackup)

Impressed with the simplicity and speed of netcat + age

Didn't test mbuffer because it's not in the arch linux repos

Edit: Added ssh. Not that bad.

digitalsignalperson commented 2 years ago

actually I think ssh wins after all (unless you want unencrypted, then plain netcat)

Checking my supported ciphers with ssh -Q cipher

Default chacha20-poly1305@openssh.com getting ~500MB/sec Trying aes128-ctr getting ~850MB/sec

Always good to measure!! I just assumed based on what I read that ssh was gonna be slow...

psy0rz commented 2 years ago

ssh is pretty ok nowadays. :)

badamson001 commented 1 year ago

Hi, I'm new to this project and was trying to get the netcat example above working but running into some confusion.

  1. Is it necessary to specify ssh-source or ssh-target if I'm using the --send-pipe and --recv-pipe?
  2. Would someone be kind enough to post a full --send-pipe --recv-pipe usage so I can see what I am misunderstanding?
psy0rz commented 1 year ago
  1. Yes you still need ssh-source or ssh-target just like you normally would. All the other stuff is still done via ssh, as wel as setting up the nc.

  2. Just get zfs-autobackup to work correctly in the regular way, after that try adding something like:

--send-pipe "nc server_name 8023" --recv-pipe "nc -l -p 8023"

The server_name is the name of the target machine, the port is an arbitrary free port you choose. Make sure there isnt any firewall in the way.

psy0rz commented 1 year ago

One of the tricky parts of this is finding an unused port that is safe to listen on and tell the other machine to connect to. Particularly in a portable way. It's doable, but annoying and possibly slow (IE it degrades to trying to listen on every port starting at until the kernel gives you one).

If that was implemented in zfs-autobackup in python, and made available as a variable to send/recv pipe (IE as $FREEPORT or something), that would make things like nc/mbuffer a lot easier and safer

No, thats too hackish and too much feature creep for this project i think. Because then you still would have issue with firewalls or portforwards for example. Too much magic isnt good. :)

Its best to let the admin choose a fixed port and make sure that this port is reachable from the source.

psy0rz commented 1 year ago

updated docs, so this one can be closed

https://github.com/psy0rz/zfs_autobackup/wiki/Performance#direct-tcp-network-transfer