oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
608 stars 137 forks source link

Too many simultaneous SSH connections #376

Closed Baughn closed 3 years ago

Baughn commented 6 years ago

When backing up 12 datasets to the same machine, I got this error:

Sep 06 19:30:00 madoka znapzend[12174]: sending snapshots from tank/home/vindex to znapzend@brage.info:stash/backups/madoka/home/vindex
Sep 06 19:30:00 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:00 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:00 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:00 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:00 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:00 madoka znapzendstart[26778]: warning: cannot send 'tank/home/kim@2018-09-06-193000': signal received
Sep 06 19:30:00 madoka znapzend[12179]: ERROR: cannot send snapshots to stash/backups/madoka/home/kim on znapzend@brage.info
Sep 06 19:30:00 madoka znapzend[12179]: ERROR: suspending cleanup source dataset because at least one send task failed
Sep 06 19:30:00 madoka znapzend[12179]: done with backupset tank/home/kim in 0 seconds
Sep 06 19:30:00 madoka znapzend[26778]: send/receive worker for tank/home/kim done (12179)
Sep 06 19:30:01 madoka znapzendstart[26778]: ssh_exchange_identification: Connection closed by remote host
Sep 06 19:30:01 madoka znapzendstart[26778]: warning: cannot send 'tank/home/minecraft@2018-09-06-193000': signal received
Sep 06 19:30:01 madoka znapzend[12154]: ERROR: cannot send snapshots to stash/backups/madoka/home/minecraft on znapzend@brage.info
Sep 06 19:30:01 madoka znapzend[12154]: ERROR: suspending cleanup source dataset because at least one send task failed

On the server end:

Sep 06 19:30:00 tsugumi sshd[98008]: drop connection #11 from [XXX.XXX.XX.XX]:34234 on [10.19.2.5]:22 past MaxStartups

(Times five).

This is a simple bug. To quote from man sshd_config:

       MaxStartups
              Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon.  Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection.  The default is 10:30:100.

              Alternatively,  random  early  drop  can  be  enabled  by  specifying the three colon separated values start:rate:full (e.g. "10:30:60").  sshd(8) will refuse connection attempts with a probability of rate/100 (30%) if there are currently start (10) unauthenticated connections.  The probability
              increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches full (60).

Which is to say, it's caused by znapzend starting work on every dataset simultaneously. Given that this would induce a lot of seeks and poor performance on spinning disks, I believe the best solution would be to only work on one dataset at a time. Alternately, if bandwidth utilization is a concern, having a limit of 1-2 streams per target host would be an option.

Harvie commented 5 years ago

Maybe just tune your .ssh/config?

https://puppet.com/blog/speed-up-ssh-by-reusing-connections

Baughn commented 5 years ago

@Harvie Tried that, doesn't work. There seems to be a race condition; the second-and-on ssh commands see the socket, but declare it nonfunctional. (Probably because the first hasn't successfully connected yet.)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.