jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.69k stars 710 forks source link

Make docker-compose work as non-root. #444

Closed stevana closed 3 years ago

stevana commented 4 years ago

I'm trying to make the docker-compose stuff work as non-root in order to be able to mount my ~/.m2 so that I can use locally installed libraries and not have to fetch the dependencies all time and at the same time not have docker write as root into that directory.

That part works. However sshing doesn't:

clojure.lang.ExceptionInfo: throw+: {:dir "/", :private-key-path nil, :password "root", :username "root", :type :jepsen.cont
rol/session-error, :port 22, :strict-host-key-checking false, :host nil, :sudo nil, :dummy nil, :message "Error opening SSH
session. Verify username, password, and node hostnames are correct.", :session nil}
        at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.6.jar:na]
        at jepsen.control.SSHRemote$fn__3007.invoke(control.clj:342) ~[jepsen-0.1.17.jar:na]
        at jepsen.control.SSHRemote.connect(control.clj:339) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session$fn__3022.invoke(control.clj:370) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_$fn__2784.invoke(reconnect.clj:59) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_.invokeStatic(reconnect.clj:57) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_.invoke(reconnect.clj:54) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session.invokeStatic(control.clj:369) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session.invoke(control.clj:365) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invoke(core.clj:660) ~[clojure-1.10.1.jar:na]
        at jepsen.util$fcatch$wrapper__2044.doInvoke(util.clj:37) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.10.1.jar:na]
        at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]
Caused by: com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:512) ~[jsch-0.1.53.jar:na]
        at com.jcraft.jsch.Session.connect(Session.java:183) ~[jsch-0.1.53.jar:na]
        at clj_ssh.ssh$fn__2480.invokeStatic(ssh.clj:118) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$fn__2480.invoke(ssh.clj:115) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh.protocols$fn__2438$G__2405__2447.invoke(protocols.clj:4) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$connect.invokeStatic(ssh.clj:401) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$connect.invoke(ssh.clj:397) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$clj_ssh_session.invokeStatic(control.clj:331) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$clj_ssh_session.invoke(control.clj:317) ~[jepsen-0.1.17.jar:na]
        at jepsen.control.SSHRemote$fn__3007.invoke(control.clj:340) ~[jepsen-0.1.17.jar:na]
        ... 34 common frames omitted

        ... 34 common frames omitted
ERROR [2020-02-26 10:46:44,676] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
clojure.lang.ExceptionInfo: throw+: {:dir "/", :private-key-path nil, :password "root", :username "root", :type :jepsen.control/session-error, :port 22, :strict-host-key-checking false, :host nil, :sudo nil, :dummy nil, :message "Error opening SSH
session. Verify username, password, and node hostnames are correct.", :session nil}
        at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.6.jar:na]
        at jepsen.control.SSHRemote$fn__3007.invoke(control.clj:342) ~[jepsen-0.1.17.jar:na]
        at jepsen.control.SSHRemote.connect(control.clj:339) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session$fn__3022.invoke(control.clj:370) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_$fn__2784.invoke(reconnect.clj:59) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_.invokeStatic(reconnect.clj:57) ~[jepsen-0.1.17.jar:na]
        at jepsen.reconnect$open_BANG_.invoke(reconnect.clj:54) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session.invokeStatic(control.clj:369) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$session.invoke(control.clj:365) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invoke(core.clj:660) ~[clojure-1.10.1.jar:na]
        at jepsen.util$fcatch$wrapper__2044.doInvoke(util.clj:37) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.10.1.jar:na]
        at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.17.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
        at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
        at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
        at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]
Caused by: com.jcraft.jsch.JSchException: Auth fail
        at com.jcraft.jsch.Session.connect(Session.java:512) ~[jsch-0.1.53.jar:na]
        at com.jcraft.jsch.Session.connect(Session.java:183) ~[jsch-0.1.53.jar:na]
        at clj_ssh.ssh$fn__2480.invokeStatic(ssh.clj:118) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$fn__2480.invoke(ssh.clj:115) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh.protocols$fn__2438$G__2405__2447.invoke(protocols.clj:4) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$connect.invokeStatic(ssh.clj:401) ~[jepsen-0.1.17.jar:na]
        at clj_ssh.ssh$connect.invoke(ssh.clj:397) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$clj_ssh_session.invokeStatic(control.clj:331) ~[jepsen-0.1.17.jar:na]
        at jepsen.control$clj_ssh_session.invoke(control.clj:317) ~[jepsen-0.1.17.jar:na]
        at jepsen.control.SSHRemote$fn__3007.invoke(control.clj:340) ~[jepsen-0.1.17.jar:na]
        ... 34 common frames omitted

If I do docker exec -it jepsen-control bash and then ssh -o StrictHostKeyChecking=no root@n1 hostname it works though.

I can't think of anything that my change introduced that would cause this to break, any hints?

stevana commented 4 years ago

Also checked that jepsen-control's public key is in the authorized keys of all the nodes, e.g.:

kyle@control:~$ cat .ssh/id_rsa.pub
ssh-rsa AAAAB3N [...]

kyle@control:~$ ssh root@n1 cat /root/.ssh/authorized_keys
ssh-rsa AAAAB3N [...]
stevana commented 4 years ago

Managed to reproduce it with clj-ssh and got some more debug info:

kyle@control:/jepsen$ lein repl
nREPL server started on port 42715 on host 127.0.0.1 - nrepl://127.0.0.1:42715
REPL-y 0.4.3, nREPL 0.6.0
Clojure 1.10.1
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~16.04-b08
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

smartlog.main=> (use 'clj-ssh.cli)
nil
smartlog.main=> (default-session-options {:strict-host-key-checking :no})
{:strict-host-key-checking :no}
smartlog.main=> (ssh "n1" "ls" :username "root")
15:21:56.840 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Connecting to n1 port 22
15:21:56.854 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Connection established
15:21:56.869 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Remote version string: SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7
15:21:56.869 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Local version string: SSH-2.0-JSCH-0.1.53
15:21:56.869 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128
,arcfour256
15:21:56.938 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521
15:21:57.020 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
15:21:57.021 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_KEXINIT sent
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_KEXINIT received
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2
-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-g
cm@openssh.com
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-g
cm@openssh.com
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-e
tm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-e
tm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: none,zlib@openssh.com
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server: none,zlib@openssh.com
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server:
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server:
15:21:57.022 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-he
llman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-c
bc
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-c
bc
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: none
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client: none
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client:
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client:
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: server->client aes128-ctr hmac-sha1 none
15:21:57.023 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - kex: client->server aes128-ctr hmac-sha1 none
15:21:57.029 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_KEX_ECDH_INIT sent
15:21:57.029 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - expecting SSH_MSG_KEX_ECDH_REPLY
15:21:57.036 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - ssh_rsa_verify: signature true
15:21:57.039 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Host 'n1' is known and matches the RSA host key
15:21:57.039 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_NEWKEYS sent
15:21:57.039 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_NEWKEYS received
15:21:57.042 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_SERVICE_REQUEST sent
15:21:57.043 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - SSH_MSG_SERVICE_ACCEPT received
15:21:57.043 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Authentications that can continue: publickey,keyboard-interactive,password
15:21:57.043 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Next authentication method: publickey
15:21:57.046 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Authentications that can continue: password
15:21:57.046 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Next authentication method: password
15:21:57.047 [nRepl-session-880cca20-78c4-4ee8-9c31-1f5990246f03] DEBUG clj-ssh.ssh - Disconnecting from n1 port 22
Execution error (JSchException) at com.jcraft.jsch.Session/connect (Session.java:512).
Auth fail
stevana commented 4 years ago

OK, almost got it now. Need to pass --ssh-private-key /home/kyle/.ssh/id_rsa, but why?

aphyr commented 4 years ago

Might be that your SSH agent isn't presenting the key?

stevana commented 4 years ago

The problem was ssh-add /root/.ssh/id_rsa &> /dev/null in docker/control/bashrc...

stevana commented 4 years ago

In case it wasn't clear: all issues have been resolved here and this is ready for review.