mutagen-io / mutagen

Fast file synchronization and network forwarding for remote development
https://mutagen.io
Other
3.47k stars 153 forks source link

Mutagen sync create will fail to find ssh even when present #495

Open mitchty opened 7 months ago

mitchty commented 7 months ago

Note I'm not sure if this is mutagen's problem, or possibly nixpkgs packaged version or not. Mostly just need some help to debug where the issue lies. Details:

But the issue (note even without MUTAGEN_SSH_PATH it dies the same way just noting the behavior):

$ MUTAGEN_SSH_PATH=/run/current-system/sw/bin/ mutagen sync create --name=test ~/src/tmp $USER@srv.home.arpa:/tmp/test
Connecting to agent (POSIX)...                                                  
Error: unable to connect to beta: unable to connect to endpoint: unable to dial agent endpoint: unable to create agent command: unable to set up SSH invocation: unable to identify 'ssh' command: exec: "ssh": executable file not found in $PATH
zsh: exit 1     env MUTAGEN_SSH_PATH=/run/current-system/sw/bin/ mutagen sync create --name=test 
$ MUTAGEN_SSH_PATH=/run/current-system/sw/bin/ mutagen sync create --name=test ~/src/tmp $USER@srv.home.arpa:/tmp/test
$ ssh srv.home.arpa "which ssh; echo \$PATH; ssh --version"                      
/run/current-system/sw/bin/ssh
/run/wrappers/bin:/home/mitch/.nix-profile/bin:/nix/profile/bin:/home/mitch/.local/state/nix/profile/bin:/etc/profiles/per-user/mitch/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
unknown option -- -
usage: ssh [-46AaCfGgKkMNnqsTtVvXxYy] [-B bind_interface] [-b bind_address]
           [-c cipher_spec] [-D [bind_address:]port] [-E log_file]
           [-e escape_char] [-F configfile] [-I pkcs11] [-i identity_file]
           [-J destination] [-L address] [-l login_name] [-m mac_spec]
           [-O ctl_cmd] [-o option] [-P tag] [-p port] [-R address]
           [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]]
           destination [command [argument ...]]
       ssh [-Q query_option]
zsh: exit 255   ssh srv.home.arpa "which ssh; echo \$PATH; ssh --version"

I presume the error indicates beta, aka the host being connected to, in this instance srv.home.arpa is reporting the error?

And more details on how this all gets laid out on nixos systems basically this last dir in $PATH is just full of symlinks to the binary in the nix store. Note both sides have the same paths/binaries (literally) as they're built off the same flake source input derivations (think configuration).

$ ls -dl $(which ssh)      
lrwxrwxrwx 2 root root 65 Dec 31  1969 /run/current-system/sw/bin/ssh -> /nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh
$ ls -dl $(readlink -f $(which ssh))      
-r-xr-xr-x 2 root root 1045280 Dec 31  1969 /nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh
$ file !$     
file $(readlink -f $(which ssh))
/nix/store/dx9w909f6hnpwkaqgalfdph5i9cdj5h0-openssh-9.6p1/bin/ssh: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped

I did a bit of looking in the source and thought maybe this was somehow relating to ssh PATH lookups using the exec.LookPath function but the best I can tell is whatever mutagen is doing to connect to the agent it either is setting PATH wrong or not inheriting a valid PATH in its environment somehow.

I threw this together quick to validate that assumption:

$ cat ssh.go     
package main

import (
        "log"
        "os/exec"
)

func main() {
        path, err := exec.LookPath("ssh")
        if err != nil {
                log.Fatal(err)
        } else {
                log.Fatal(path)
        }
}

And see the following behavior which seems to largely match though exec.LookPath does succeed when I don't set PATH at all it does appear to pick up the correct path for ssh which if thats what the mutagen agent does remotely I would expect things to work.

$ go build ssh.go                                                                  
$ ssh srv.home.arpa env PATH= ~/src/tmp/ssh                                                       
2024/04/19 20:56:15 exec: "ssh": executable file not found in $PATH
zsh: exit 1     ssh srv.home.arpa env PATH= ~/src/tmp/ssh
$ ssh srv.home.arpa PATH= ~/src/tmp/ssh                                                                        
2024/04/19 20:56:21 exec: "ssh": executable file not found in $PATH
zsh: exit 1     ssh srv.home.arpa PATH= ~/src/tmp/ssh
$ ssh srv.home.arpa PATH=/sw/current-system/sw/bin ~/src/tmp/ssh                                        
2024/04/19 20:56:40 exec: "ssh": executable file not found in $PATH
zsh: exit 1     ssh srv.home.arpa PATH=/sw/current-system/sw/bin ~/src/tmp/ssh
$ ssh srv.home.arpa PATH=/sw/current-system/sw/bin:/usr/bin ~/src/tmp/ssh                       
2024/04/19 20:56:56 exec: "ssh": executable file not found in $PATH
zsh: exit 1     ssh srv.home.arpa PATH=/sw/current-system/sw/bin:/usr/bin ~/src/tmp/ssh
$ ssh srv.home.arpa ~/src/tmp/ssh                                         
2024/04/19 20:57:01 /run/current-system/sw/bin/ssh
zsh: exit 1     ssh srv.home.arpa ~/src/tmp/ssh

So I suppose this breaks down to two questions, how should I debug this further and would it pay off to make this PATH lookup optional somehow? If ssh isn't in the PATH on the remote or local server exec() will just fail with ENOENT anyway so not entirely sure what the lookup buys mutagen. ref:

$ strace -e execve -f env PATH=/run/current-system/sw/bin ssh -V                                                              ~ wm2
execve("/home/mitch/.nix-profile/bin/env", ["env", "PATH=/run/current-system/sw/bin", "ssh", "-V"], 0x7ffdc22e4dc0 /* 96 vars */) = 0
execve("/run/current-system/sw/bin/ssh", ["ssh", "-V"], 0x7ffd0e0cd7e0 /* 96 vars */) = 0
OpenSSH_9.6p1, OpenSSL 3.0.13 30 Jan 2024
+++ exited with 0 +++
$ strace -e execve -f env PATH=/not/valid ssh -V                                                                              ~ wm2
execve("/home/mitch/.nix-profile/bin/env", ["env", "PATH=/not/valid", "ssh", "-V"], 0x7ffec607f9d0 /* 96 vars */) = 0
execve("/not/valid/ssh", ["ssh", "-V"], 0x7fff9fee6020 /* 96 vars */) = -1 ENOENT (No such file or directory)
env: ‘ssh’: No such file or directory
+++ exited with 127 +++
zsh: exit 127   strace -e execve -f env PATH=/not/valid ssh -V

I can break out perf or ebpf maybe and look at any ssh children's execve()'s if it helps but figured I'd open an issue first to see if I missed anything obvious.

If I disabled the lookup and custom compiled mutagen with it just exec()'ing ssh hoping for an ok PATH would that be a worthwhile endeavor? Enough squirrel banter though, mutagen is great as is just need some tips on where I might debug things further.