These are policy-driven snapshot management and replication tools which use OpenZFS for underlying next-gen storage. (Btrfs support plans are shelved unless and until btrfs becomes reliable.)
The documentation that does exists is very sparse and incomplete. I had to read the source to figure out how it works.
The usage text only says: --insecure-direct-connection=IP:PORT[,IP:PORT] WARNING: DATA IS NOT ENCRYPTED. First address pair is for connecting to the target and the second for listening at the target
README.md has more, but still incomplete:
--insecure-direct-connection=IP:PORT[,IP:PORT,[TIMEOUT,[mbuffer]]]
WARNING: This is an insecure option as the data is not encrypted while being sent over the network. Only use if you trust the complete network path.
Use a direct tcp connection (with socat and busybox nc/mbuffer) for the actual zfs send/recv stream. All control commands are still executed via the ssh connection. The first address pair is used for connecting to the target host from the source host and the second pair is for listening on the target host. If the later isn't provided the same as the former is used. This can be used for saturating high throughput connection like >= 10GBe network which isn't easy with the overhead off ssh. It can also be useful for encrypted datasets to lower the cpu usage needed for replication but be aware that metadata is NOT ENCRYPTED in this case. The default timeout is 60 seconds and can be overridden by providing it as third argument. By default busybox nc is used for the listeing tcp socket, if mbuffer is preferred specify its name as fourth argument
but be aware that mbuffer listens on all interfaces and uses an optionally provided ip address for access restriction (This option can't be used for relaying between two remote hosts)
How it actualy works is:
the argument is split around commas into 1 to 4 separate parameters
the first IP:PORT parameter is used at the source as the target address/hostname (it can also be hostname:PORT, PORT is required as it's used on the receive end as the listening port)
the 2nd IP:PORT parameter is used at the target as the listening address (it can be just a PORT, in which case it listens on all addresses), EXCEPT in the special case when the 4th item is the word mbuffer, in which case it's a filter on the receive side that only allows connections from that SOURCE IP/hostname (the :PORT is still used as the listening port, but on all addresses)
the 3rd parameter is the connection timeout in seconds (it defaults to 60, but it must be set if you want to pass the 4th parameter mbuffer)
the 4th parameter can be mbuffer, which switches the mode of operation at the receive side from using busybox nc to mbuffer. It also switches the meaning of the 2nd IP:PORT parameter, as it goes from being passed from nc to mbuffer
When the switch is enabled:
on the source, data is piped into socat - TCP:$1,retry=$3,interval=1 ($1 is the 1st parameter, $3 defaults to 60, or the 3rd parameter). socat tries to retry connecting with a 1s interval up to $3 or 60s max.
on the destination:
if mbuffer is not enabled, data is received using: busybox nc -l $2 -w $3 ($2 is the 2nd parameter, $3 the 3rd or default to 60). Here, $2 is the listening address and port (or just port on all addresses). The receiving socket times out after $3 or 60s.
if mbuffer is enabled, it's received using mbuffer -W $3 -I $2 (and also other options that are used with mbuffer normally even without insecure-direct-connection, like source-bwlimit). mbuffer describes -I <port> Use network port port as input instead of the standard input. If given a hostname and a port in the form hostname:port, only the given host is allowed to connect. Thus $2 goes from being the listening address and port, to source address filter and listening port. The receiving socket times out after $3 or 60s.
Additionally, each command is checked for existence on the local and remote host. syncoid will die with an error if any of the following is true:
if socat is not available on the source
if busybox is not available on the destination and not in mbuffer mode
if mbuffer is not available on the destination and in mbuffer mode
Some examples:
--insecure-direct-connection=remote.example:7000
socat on the source connects to remote.example:7000, remote listens with busybox nc on remote.example:7000
--insecure-direct-connection=remote.example:7000,8000
as above but remote listens on port 8000 (for example if the remote has NAT for port 7000->8000)
--insecure-direct-connection=remote.example:7000,192.0.2.3:7000
as above but remote listens on private IP (for example if it's behind NAT, remote.example is its public DNS name, so it can't listen directly on it)
--insecure-direct-connection=remote.example:7000,7000,60,mbuffer
source as above, remote listens on port 7000 on all addresses using mbuffer with 60s timeout
--insecure-direct-connection=remote.example:7000,source.example:7000,60,mbuffer
as above but remote also accepts only connections from source.example and ignores all other source addresses
Any hostname can be a DNS or IP name, as preferred.
The documentation that does exists is very sparse and incomplete. I had to read the source to figure out how it works.
The usage text only says:
--insecure-direct-connection=IP:PORT[,IP:PORT] WARNING: DATA IS NOT ENCRYPTED. First address pair is for connecting to the target and the second for listening at the target
README.md has more, but still incomplete:
How it actualy works is:
IP:PORT
parameter is used at the source as the target address/hostname (it can also behostname:PORT
, PORT is required as it's used on the receive end as the listening port)IP:PORT
parameter is used at the target as the listening address (it can be just aPORT
, in which case it listens on all addresses), EXCEPT in the special case when the 4th item is the wordmbuffer
, in which case it's a filter on the receive side that only allows connections from that SOURCE IP/hostname (the:PORT
is still used as the listening port, but on all addresses)mbuffer
)mbuffer
, which switches the mode of operation at the receive side from usingbusybox nc
tombuffer
. It also switches the meaning of the 2ndIP:PORT
parameter, as it goes from being passed fromnc
tombuffer
When the switch is enabled:
socat - TCP:$1,retry=$3,interval=1
($1
is the 1st parameter,$3
defaults to 60, or the 3rd parameter). socat tries to retry connecting with a 1s interval up to $3 or 60s max.busybox nc -l $2 -w $3
($2 is the 2nd parameter, $3 the 3rd or default to 60). Here, $2 is the listening address and port (or just port on all addresses). The receiving socket times out after $3 or 60s.mbuffer -W $3 -I $2
(and also other options that are used with mbuffer normally even without insecure-direct-connection, like source-bwlimit). mbuffer describes-I <port> Use network port port as input instead of the standard input. If given a hostname and a port in the form hostname:port, only the given host is allowed to connect.
Thus $2 goes from being the listening address and port, to source address filter and listening port. The receiving socket times out after $3 or 60s.Additionally, each command is checked for existence on the local and remote host. syncoid will die with an error if any of the following is true:
Some examples:
--insecure-direct-connection=remote.example:7000
socat on the source connects to remote.example:7000, remote listens with busybox nc on remote.example:7000--insecure-direct-connection=remote.example:7000,8000
as above but remote listens on port 8000 (for example if the remote has NAT for port 7000->8000)--insecure-direct-connection=remote.example:7000,192.0.2.3:7000
as above but remote listens on private IP (for example if it's behind NAT, remote.example is its public DNS name, so it can't listen directly on it)--insecure-direct-connection=remote.example:7000,7000,60,mbuffer
source as above, remote listens on port 7000 on all addresses using mbuffer with 60s timeout--insecure-direct-connection=remote.example:7000,source.example:7000,60,mbuffer
as above but remote also accepts only connections from source.example and ignores all other source addressesAny hostname can be a DNS or IP name, as preferred.
I hope this accurately describes how it works.