jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.68k stars 710 forks source link

Error loading native library libjnidispatch.so #539

Open rhishikeshj opened 2 years ago

rhishikeshj commented 2 years ago

Setup

Basic jepsen test as designed in the tutorial running on docker on macOS (i.e Docker desktop)

Expectation

lein run test should execute the single-test

Observation

Tests crash with following exception :

ERROR [2022-07-01 11:37:47,980] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.NoClassDefFoundError: Could not initialize class com.jcraft.jsch.agentproxy.usocket.JNAUSocketFactory$CLibrary
        at com.jcraft.jsch.agentproxy.usocket.JNAUSocketFactory.open(JNAUSocketFactory.java:114)
        at com.jcraft.jsch.agentproxy.connector.SSHAgentConnector.open(SSHAgentConnector.java:93)
        at com.jcraft.jsch.agentproxy.connector.SSHAgentConnector.<init>(SSHAgentConnector.java:54)
        at com.jcraft.jsch.agentproxy.ConnectorFactory.createConnector(ConnectorFactory.java:104)
        at jepsen.control.sshj$agent_proxy.invokeStatic(sshj.clj:38)
        at jepsen.control.sshj$agent_proxy.invoke(sshj.clj:36)
        at jepsen.control.sshj$auth_BANG_$fn__6739.invoke(sshj.clj:54)

I did some digging around and this seems to be related to libjna so I ensured that both libjna-jni and libjna-java are installed on the control node. I also upgraded the java version to openjdk-17 in hopes that it would solve the problem.

To get to the bottom of this, I ran a simple lein repl and executed

(in-ns 'jepsen.control.sshj)
(agent-proxy)

This fails with:

Execution error (UnsatisfiedLinkError) at com.sun.jna.Native/loadNativeDispatchLibraryFromClasspath (Native.java:776).
Native library (com/sun/jna/linux-aarch64/libjnidispatch.so) not found in resource path <truncated list of all ~/.m2 resource paths>

As you can see, I am running Docker on a macbook M1 hence the aarch64 requirement.

I also tried to manually copy libjnidispatch.so from the system (path /usr/lib/aarch64-linux-gnu/jni/libjnidispatch.system.so) to a local folder and setting the LD_LIBRARY_PATH to include this .so file but this also does not work.

I understand that Docker has been a source of constant pain and a "tirefire" but I really do want to get Jepsen tests working in docker for its ease of use. Also, setting up LXC on macOS (arm) seems to be a pain

rhishikeshj commented 2 years ago

@chhetripradeep Since you're the Docker whiz around here :) would love some pointers from you !

nurturenature commented 1 year ago

The Docker disappointment du jour is docker compose having lost, or being in the process of losing, the ability to configure os/systemd containers.

docker run can configure os/systemd containers.

See prior issue and mailing list for more discussion, links to Docker issues.

As a macOS user, depending on your os/systemd/app/etc versions, you may be able to use Jepsen's docker compose.

You may be able to use Docker's (only on macOS) deprecatedCgroupv1 setting.

The only cross platform way to use os/systemd containers at this time is to decompose docker compose into a series of individual docker run commands.

I created a set of scripts that attempt to minimally replicate Jepsen's docker compose environment at jepsen-docker-workaround. It is developed and tested using Docker Desktop (on Debian) and is expected to work on all platform's Docker Desktops.

My personal experience is that using LXC (on Debian) to develop Jepsen tests has been both productive and enjoyable.

P.S. And for completeness, one can also override Docker's system-wide defaults by editing daemon.json: "default-cgroupns-mode": "host". (the current Docker documentation is incorrect in that it actually defaults to "private")

rhishikeshj commented 1 year ago

Hi @nurturenature :) Thanks for your inputs and help ! I managed to get the docker setup up and running via your scripts too.

But unfortunately I am running into the same error :

Execution error (UnsatisfiedLinkError) at com.sun.jna.Native/loadNativeDispatchLibraryFromClasspath (Native.java:776).
Native library (com/sun/jna/linux-aarch64/libjnidispatch.so) not found in resource path <truncated list of all ~/.m2 resource paths>

To re-iterate, this is me testing on macOS using Docker Desktop running on an M1 chip. To boil this down to its minimum problem, I am running the most basic clojure project which only does this in the -main function

(defn -main
  [& args]
  (sshj/agent-proxy)
  ;; (cli/run! (merge (cli/single-test-cmd {:test-fn stm-test})
  ;;                  (cli/serve-cmd))
  ;;           args)
  )

And to start the control node

docker run -d -v <path-to-minimal-jepsen-test>:/jepsen --name jepsen_control jepsen_control
./bin/control
lein run test

This also proceeds to fail in the same manner.

I suspect this is related to the openjdk and docker desktop running on an M1... But I am not sure what next I can try. The interwebs are not much help and the few things I did find to be relevant, I have unsuccessfully tried

nurturenature commented 1 year ago

I suspect this is related to the openjdk and docker desktop running on an M1

I am not knowledgeable re the M1 environment, but...

The errors described above indicate the control container is a macOS image. Using either the Jepsen compose or workaround environment results in a Debain control (and database nodes) container.

Suggest using either the Jepsen or workaround environment fully, e.g. ./bin/up, ./bin/console|control vs the individual docker run command and then confirming that the control container is the intended one:

root@control:/jepsen# uname -a
Linux control 5.10.104-linuxkit #1 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64 GNU/Linux

root@control:/jepsen# java -version
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment (build 11.0.15+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.15+10-post-Debian-1deb11u1, mixed mode, sharing)

root@control:/jepsen# ssh n1
...
Linux n1 5.10.104-linuxkit #1 SMP Thu Mar 17 17:08:06 UTC 2022 x86_64
...
rhishikeshj commented 1 year ago

@nurturenature The behavior (error) is the same when running the jepsen control environment i.e

docker-compose OR individual docker run OR simple docker run for the jepsen_control docker image

Here's the output from the uname and java version commands.

root@107a76b79f8e:/jepsen# uname -a
Linux 107a76b79f8e 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 GNU/Linux
root@107a76b79f8e:/jepsen# java -version
openjdk version "11.0.15" 2022-04-19
OpenJDK Runtime Environment (build 11.0.15+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.15+10-post-Debian-1deb11u1, mixed mode)

Unfortunately though, I suspect this is slightly more sinister. It might be due to the way Docker desktop works on a mac. It launches an Alpine image which then launches these containers. So... i dont know though.

nurturenature commented 1 year ago

Thanks for trying and reporting back.

I suspect this is slightly more sinister. ... launches an Alpine image which then launches these containers.

The Docker daemon being eponymous?

I'll add to the issue if I see anything relevant elsewhere.

yito88 commented 1 year ago

I faced the same issue. The version of jna in Jepsen was 4.1.0, which doesn't support aarch64.

[jepsen "0.2.7"]
   [byte-streams "0.2.5-alpha2"]
     [clj-tuple "0.2.2"]
     [manifold "0.1.8"]
       [io.aleph/dirigiste "0.1.5"]
     [primitive-math "0.1.6"]
   [clj-ssh "0.5.14"]
     [com.jcraft/jsch.agentproxy.core "0.0.9"]
     [com.jcraft/jsch.agentproxy.jsch "0.0.9"]
     [com.jcraft/jsch.agentproxy.pageant "0.0.9"]
     [com.jcraft/jsch.agentproxy.sshagent "0.0.9"]
     [com.jcraft/jsch.agentproxy.usocket-jna "0.0.9"]
       [net.java.dev.jna/jna-platform "4.1.0"]
       [net.java.dev.jna/jna "4.1.0"]
...

So, it worked well in my env. after replacing the version.

                 [jepsen "0.2.7":exclusions [net.java.dev.jna/jna
                                             net.java.dev.jna/jna-platform]]
                 [net.java.dev.jna/jna "5.11.0"]
                 [net.java.dev.jna/jna-platform "5.11.0"]