--insecure-direct-connection is undocumented

The documentation that does exists is very sparse and incomplete. I had to read the source to figure out how it works.

The usage text only says: --insecure-direct-connection=IP:PORT[,IP:PORT] WARNING: DATA IS NOT ENCRYPTED. First address pair is for connecting to the target and the second for listening at the target

README.md has more, but still incomplete:

--insecure-direct-connection=IP:PORT[,IP:PORT,[TIMEOUT,[mbuffer]]]

    WARNING: This is an insecure option as the data is not encrypted while being sent over the network. Only use if you trust the complete network path.
    Use a direct tcp connection (with socat and busybox nc/mbuffer) for the actual zfs send/recv stream. All control commands are still executed via the ssh connection. The first address pair is used for connecting to the target host from the source host and the second pair is for listening on the target host. If the later isn't provided the same as the former is used. This can be used for saturating high throughput connection like >= 10GBe network which isn't easy with the overhead off ssh. It can also be useful for encrypted datasets to lower the cpu usage needed for replication but be aware that metadata is NOT ENCRYPTED in this case. The default timeout is 60 seconds and can be overridden by providing it as third argument. By default busybox nc is used for the listeing tcp socket, if mbuffer is preferred specify its name as fourth argument
but be aware that mbuffer listens on all interfaces and uses an optionally provided ip address for access restriction (This option can't be used for relaying between two remote hosts)

How it actualy works is:

the argument is split around commas into 1 to 4 separate parameters
the first IP:PORT parameter is used at the source as the target address/hostname (it can also be hostname:PORT, PORT is required as it's used on the receive end as the listening port)
the 2nd IP:PORT parameter is used at the target as the listening address (it can be just a PORT, in which case it listens on all addresses), EXCEPT in the special case when the 4th item is the word mbuffer, in which case it's a filter on the receive side that only allows connections from that SOURCE IP/hostname (the :PORT is still used as the listening port, but on all addresses)
the 3rd parameter is the connection timeout in seconds (it defaults to 60, but it must be set if you want to pass the 4th parameter mbuffer)
the 4th parameter can be mbuffer, which switches the mode of operation at the receive side from using busybox nc to mbuffer. It also switches the meaning of the 2nd IP:PORT parameter, as it goes from being passed from nc to mbuffer

When the switch is enabled:

on the source, data is piped into socat - TCP:$1,retry=$3,interval=1 ($1 is the 1st parameter, $3 defaults to 60, or the 3rd parameter). socat tries to retry connecting with a 1s interval up to $3 or 60s max.
on the destination:
- if mbuffer is not enabled, data is received using: busybox nc -l $2 -w $3 ($2 is the 2nd parameter, $3 the 3rd or default to 60). Here, $2 is the listening address and port (or just port on all addresses). The receiving socket times out after $3 or 60s.
- if mbuffer is enabled, it's received using mbuffer -W $3 -I $2 (and also other options that are used with mbuffer normally even without insecure-direct-connection, like source-bwlimit). mbuffer describes -I <port> Use network port port as input instead of the standard input. If given a hostname and a port in the form hostname:port, only the given host is allowed to connect. Thus $2 goes from being the listening address and port, to source address filter and listening port. The receiving socket times out after $3 or 60s.

Additionally, each command is checked for existence on the local and remote host. syncoid will die with an error if any of the following is true:

if socat is not available on the source
if busybox is not available on the destination and not in mbuffer mode
if mbuffer is not available on the destination and in mbuffer mode

Some examples:

--insecure-direct-connection=remote.example:7000 socat on the source connects to remote.example:7000, remote listens with busybox nc on remote.example:7000
--insecure-direct-connection=remote.example:7000,8000 as above but remote listens on port 8000 (for example if the remote has NAT for port 7000->8000)
--insecure-direct-connection=remote.example:7000,192.0.2.3:7000 as above but remote listens on private IP (for example if it's behind NAT, remote.example is its public DNS name, so it can't listen directly on it)
--insecure-direct-connection=remote.example:7000,7000,60,mbuffer source as above, remote listens on port 7000 on all addresses using mbuffer with 60s timeout
--insecure-direct-connection=remote.example:7000,source.example:7000,60,mbuffer as above but remote also accepts only connections from source.example and ignores all other source addresses

Any hostname can be a DNS or IP name, as preferred.

I hope this accurately describes how it works.

jimsalterjrs / sanoid

--insecure-direct-connection is undocumented #963