Note I'm not sure if this is mutagen's problem, or possibly nixpkgs packaged version or not. Mostly just need some help to debug where the issue lies. Details:
mutagen 0.17.1
This only crops up when syncing from nixos->nixos, when I sync from macos->nixos (using the nixpkgs mutagen there too) its fine, the fact macos->nixos works strikes me as meaning its probably fine? (i've no proof of this yet just a note that it does work)
I'm unclear why mutagen needs to get the full path for exec()'ing ssh and why PATH lookup even needs to occur vs just exec()'ing
But the issue (note even without MUTAGEN_SSH_PATH it dies the same way just noting the behavior):
I presume the error indicates beta, aka the host being connected to, in this instance srv.home.arpa is reporting the error?
And more details on how this all gets laid out on nixos systems basically this last dir in $PATH is just full of symlinks to the binary in the nix store. Note both sides have the same paths/binaries (literally) as they're built off the same flake source input derivations (think configuration).
$ ls -dl $(which ssh)
lrwxrwxrwx 2 root root 65 Dec 31 1969 /run/current-system/sw/bin/ssh -> /nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh
$ ls -dl $(readlink -f $(which ssh))
-r-xr-xr-x 2 root root 1045280 Dec 31 1969 /nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh
$ file !$
file $(readlink -f $(which ssh))
/nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped
I did a bit of looking in the source and thought maybe this was somehow relating to ssh PATH lookups using the exec.LookPath function but the best I can tell is whatever mutagen is doing to connect to the agent it either is setting PATH wrong or not inheriting a valid PATH in its environment somehow.
I threw this together quick to validate that assumption:
And see the following behavior which seems to largely match though exec.LookPath does succeed when I don't set PATH at all it does appear to pick up the correct path for ssh which if thats what the mutagen agent does remotely I would expect things to work.
$ go build ssh.go
$ ssh srv.home.arpa env PATH= ~/src/tmp/ssh
2024/04/19 20:56:15 exec: "ssh": executable file not found in $PATH
zsh: exit 1 ssh srv.home.arpa env PATH= ~/src/tmp/ssh
$ ssh srv.home.arpa PATH= ~/src/tmp/ssh
2024/04/19 20:56:21 exec: "ssh": executable file not found in $PATH
zsh: exit 1 ssh srv.home.arpa PATH= ~/src/tmp/ssh
$ ssh srv.home.arpa PATH=/sw/current-system/sw/bin ~/src/tmp/ssh
2024/04/19 20:56:40 exec: "ssh": executable file not found in $PATH
zsh: exit 1 ssh srv.home.arpa PATH=/sw/current-system/sw/bin ~/src/tmp/ssh
$ ssh srv.home.arpa PATH=/sw/current-system/sw/bin:/usr/bin ~/src/tmp/ssh
2024/04/19 20:56:56 exec: "ssh": executable file not found in $PATH
zsh: exit 1 ssh srv.home.arpa PATH=/sw/current-system/sw/bin:/usr/bin ~/src/tmp/ssh
$ ssh srv.home.arpa ~/src/tmp/ssh
2024/04/19 20:57:01 /run/current-system/sw/bin/ssh
zsh: exit 1 ssh srv.home.arpa ~/src/tmp/ssh
So I suppose this breaks down to two questions, how should I debug this further and would it pay off to make this PATH lookup optional somehow? If ssh isn't in the PATH on the remote or local server exec() will just fail with ENOENT anyway so not entirely sure what the lookup buys mutagen. ref:
$ strace -e execve -f env PATH=/run/current-system/sw/bin ssh -V ~ wm2
execve("/home/mitch/.nix-profile/bin/env", ["env", "PATH=/run/current-system/sw/bin", "ssh", "-V"], 0x7ffdc22e4dc0 /* 96 vars */) = 0
execve("/run/current-system/sw/bin/ssh", ["ssh", "-V"], 0x7ffd0e0cd7e0 /* 96 vars */) = 0
OpenSSH_9.6p1, OpenSSL 3.0.13 30 Jan 2024
+++ exited with 0 +++
$ strace -e execve -f env PATH=/not/valid ssh -V ~ wm2
execve("/home/mitch/.nix-profile/bin/env", ["env", "PATH=/not/valid", "ssh", "-V"], 0x7ffec607f9d0 /* 96 vars */) = 0
execve("/not/valid/ssh", ["ssh", "-V"], 0x7fff9fee6020 /* 96 vars */) = -1 ENOENT (No such file or directory)
env: ‘ssh’: No such file or directory
+++ exited with 127 +++
zsh: exit 127 strace -e execve -f env PATH=/not/valid ssh -V
I can break out perf or ebpf maybe and look at any ssh children's execve()'s if it helps but figured I'd open an issue first to see if I missed anything obvious.
If I disabled the lookup and custom compiled mutagen with it just exec()'ing ssh hoping for an ok PATH would that be a worthwhile endeavor? Enough squirrel banter though, mutagen is great as is just need some tips on where I might debug things further.
Note I'm not sure if this is mutagen's problem, or possibly nixpkgs packaged version or not. Mostly just need some help to debug where the issue lies. Details:
But the issue (note even without MUTAGEN_SSH_PATH it dies the same way just noting the behavior):
I presume the error indicates beta, aka the host being connected to, in this instance srv.home.arpa is reporting the error?
And more details on how this all gets laid out on nixos systems basically this last dir in $PATH is just full of symlinks to the binary in the nix store. Note both sides have the same paths/binaries (literally) as they're built off the same flake source input derivations (think configuration).
I did a bit of looking in the source and thought maybe this was somehow relating to ssh PATH lookups using the exec.LookPath function but the best I can tell is whatever mutagen is doing to connect to the agent it either is setting PATH wrong or not inheriting a valid PATH in its environment somehow.
I threw this together quick to validate that assumption:
And see the following behavior which seems to largely match though exec.LookPath does succeed when I don't set PATH at all it does appear to pick up the correct path for ssh which if thats what the mutagen agent does remotely I would expect things to work.
So I suppose this breaks down to two questions, how should I debug this further and would it pay off to make this PATH lookup optional somehow? If ssh isn't in the PATH on the remote or local server exec() will just fail with ENOENT anyway so not entirely sure what the lookup buys mutagen. ref:
I can break out perf or ebpf maybe and look at any ssh children's execve()'s if it helps but figured I'd open an issue first to see if I missed anything obvious.
If I disabled the lookup and custom compiled mutagen with it just exec()'ing ssh hoping for an ok PATH would that be a worthwhile endeavor? Enough squirrel banter though, mutagen is great as is just need some tips on where I might debug things further.