Closed sdettmer closed 7 months ago
I'll try
--- /usr/sbin/syncoid.dist 2021-04-01 17:41:44.000000000 +0200
+++ /usr/sbin/syncoid 2024-04-01 12:02:30.824282463 +0200
@@ -1488,7 +1488,7 @@
if ($rhost ne "") {
if ($remoteuser eq 'root' || $args{'no-privilege-elevation'}) { $isroot = 1; } else { $isroot = 0; }
# now we need to establish a persistent master SSH connection
- $socket = "/tmp/syncoid-$rhost-" . time();
+ $socket = "/tmp/syncoid-$$-$rhost-" . time();
open FH, "$sshcmd -M -S $socket -o ControlPersist=1m $args{'sshport'} $rhost exit |";
close FH;
@sdettmer This is already fixed in master: https://github.com/jimsalterjrs/sanoid/blob/19fc237476452bfa7499e6dfda77a8a6eee20b4f/syncoid#L1790
Hello,
thank you for sharing this the great tool and all the efforts that get put in!
I wrote systemd units for each pool (using different --identifier=EXTRA parameters), but sometimes I see errors like
ControlSocket /tmp/syncoid-root@pve-1711926601 already exists, disabling multiplexing
Often (but not always), I see other errors related to that, like
CRITICAL ERROR: zfs send -I 'tank1/home'@'syncoid_nas-datenklo2_2024-04-01:01:10:02-GMT02:00' 'tank1/home'@'syncoid_nas-datenklo2_2024-04-01:04:10:02-GMT02:00' | lzop | mbuffer -R 100M -q -s 128k -m 16M 2>/dev/null | ssh -S /tmp/syncoid-root@pve-1711937401 root@nas ' mbuffer -r 40M -q -s 128k -m 16M 2>/dev/null | lzop -dfc | zfs receive -s -F '"'"'tank1/datenklo/homel'"'"' 2>&1' failed: 65280 at /usr/sbin/syncoid line 817.
If I understood correctly, 65280 is just the Perl return value for "sub process returned 1". Are this "follow-up" errors of the control socket one?
I don't understand what it means, on the remote logs I see that snap are created and old are pruned around the same time (by sanoid), and that SSH was later disconnected by peer.
I think 1711926601 is a timestamp and the file name does not contain the --identifier=EXTRA, nor a pool name or a PID, so syncoids for the pools seem likely to share a ControlSocket filename, which probably is bad.
Would it help to add a PID, random number or even better using a tmp file generator like discussed in #532, which seems to have similar proposals and a patch. As it was not accepted apparently it is not that simple and I may have a different issue?