jepsen-io / maelstrom

A workbench for writing toy implementations of distributed systems.
Eclipse Public License 1.0
3.05k stars 189 forks source link

Permission denied when running `--bin` binary #42

Closed philippgille closed 1 year ago

philippgille commented 1 year ago

Hello :wave: ,

I'm checking out the Fly distributed systems challenges, and for the "Echo" demo, after compiling the Go code and trying to run the maelstrom tests, I get this error:

[...]
INFO [2023-02-25 14:58:30,921] jepsen node n0 - maelstrom.net Starting Maelstrom network
INFO [2023-02-25 14:58:30,922] jepsen test runner - jepsen.db Tearing down DB
INFO [2023-02-25 14:58:30,923] jepsen test runner - jepsen.db Setting up DB
INFO [2023-02-25 14:58:30,925] jepsen node n0 - maelstrom.service Starting services: (lin-kv lin-tso lww-kv seq-kv)
INFO [2023-02-25 14:58:30,925] jepsen node n0 - maelstrom.db Setting up n0
INFO [2023-02-25 14:58:30,926] jepsen node n0 - maelstrom.process launching /home/johndoe/path/to/fly-dist-sys/1_echo []
INFO [2023-02-25 14:58:31,930] jepsen node n0 - maelstrom.net Shutting down Maelstrom network
WARN [2023-02-25 14:58:31,933] jepsen test runner - jepsen.core Test crashed!
java.io.IOException: Cannot run program "/home/johndoe/path/to/fly-dist-sys/1_echo" (in directory "/tmp"): error=13, Permission denied
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
    at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
    at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
    at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
    at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
    at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
    at clojure.core$partial$fn__5908.invoke(core.clj:2642)
    at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
    at clojure.lang.AFn.applyToHelper(AFn.java:154)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
    at clojure.lang.RestFn.applyTo(RestFn.java:142)
    at clojure.core$apply.invokeStatic(core.clj:671)
    at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.lang.AFn.applyToHelper(AFn.java:156)
    at clojure.lang.RestFn.applyTo(RestFn.java:132)
    at clojure.core$apply.invokeStatic(core.clj:671)
    at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: error=13, Permission denied
    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
    ... 31 common frames omitted
ERROR [2023-02-25 14:58:31,938] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.io.IOException: Cannot run program "/home/johndoe/path/to/fly-dist-sys/1_echo" (in directory "/tmp"): error=13, Permission denied
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
    at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
    at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
    at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
    at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
    at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
    at clojure.core$partial$fn__5908.invoke(core.clj:2642)
    at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
    at clojure.lang.AFn.applyToHelper(AFn.java:154)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
    at clojure.lang.RestFn.applyTo(RestFn.java:142)
    at clojure.core$apply.invokeStatic(core.clj:671)
    at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
    at clojure.lang.AFn.applyToHelper(AFn.java:152)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:667)
    at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
    at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
    at clojure.lang.RestFn.invoke(RestFn.java:425)
    at clojure.lang.AFn.applyToHelper(AFn.java:156)
    at clojure.lang.RestFn.applyTo(RestFn.java:132)
    at clojure.core$apply.invokeStatic(core.clj:671)
    at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
    at clojure.lang.RestFn.invoke(RestFn.java:397)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: error=13, Permission denied
    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
    ... 31 common frames omitted

The compiled Go binary, as well as maelstrom belong to johndoe user and group.

I'm running maelstrom like this: ./maelstrom test -w echo --bin ~/path/to/fly-dist-sys/1_echo --node-count 1 --time-limit 10

I'm on Linux (Fedora).

Am I doing anything wrong? Should permissions be set differently?

aphyr commented 1 year ago

I'm not sure! I'm assuming you're running maelstrom itself as johndoe. Is it possible that 1_echo isn't executable? What happens when you try to run 1_echo by itself?

matthiasr commented 1 year ago

What is the output of ls -la /home/johndoe/path/to/fly-dist-sys/1_echo?

I am also intrigued by the (in directory /tmp) message. Could it be that your /tmp is mounted with the noexec option, and something about your setup means the actual executable is somewhere in /tmp?

philippgille commented 1 year ago

I'm assuming you're running maelstrom itself as johndoe

Yes exactly.

Is it possible that 1_echo isn't executable?

I compiled the code with go build, which results in an executable file:

$ ll ~/path/to/fly-dist-sys/1_echo/
total 2408
-rwxr-xr-x. 1 johndoe johndoe 2450246 Feb 25 14:55 1_echo
-rw-r--r--. 1 johndoe johndoe     143 Feb 25 14:35 go.mod
-rw-r--r--. 1 johndoe johndoe     251 Feb 25 14:35 go.sum
-rw-r--r--. 1 johndoe johndoe     586 Feb 25 14:34 main.go

What happens when you try to run 1_echo by itself?

It starts to run and (I assume) waits for the echo message. Here I'm waiting a bit and then entering "foo" followed by Enter:

$ ./1_echo 
foo
2023/02/27 20:09:54 unmarshal message: invalid character 'o' in literal false (expecting 'a')

What is the output of ls -la /home/johndoe/path/to/fly-dist-sys/1_echo?

$ ls -la ~/path/to/fly-dist-sys/1_echo/1_echo 
-rwxr-xr-x. 1 johndoe johndoe 2450246 Feb 27 20:08 /home/johndoe/path/to/fly-dist-sys/1_echo/1_echo

I am also intrigued by the (in directory /tmp) message. Could it be that your /tmp is mounted with the noexec option, and something about your setup means the actual executable is somewhere in /tmp?

I was wondering about the /tmp as well. I assumed that maybe maelstrom moves files around before executing them. From my side the code is in my regular home directory, and the executable (compiled with go build) as well.


Additional info:

philippgille commented 1 year ago

OK sorry guys, found the issue: me :see_no_evil:

After seeing https://github.com/jepsen-io/maelstrom/issues/37 and it being a mistake of what's executable and what not, I questioned if I ran everything correctly.

From my first :arrow_up: post:

I'm running maelstrom like this: ./maelstrom test -w echo --bin ~/path/to/fly-dist-sys/1_echo --node-count 1 --time-limit 10

=> Problem is ~/path/to/fly-dist-sys/1_echo is the directory, I mixed it up because the binary has the same name: ~/path/to/fly-dist-sys/1_echo/1_echo.

Sorry for wasting your time! :bow:

aphyr commented 1 year ago

Hmm. Well that all seems to be in order. Maelstrom doesn't move anything around; it tries to invoke your executable without arguments, in place, using the full path to whatever --bin you provided. When it runs your program it does use the default temporary directory as cwd (specifically, whatever java.io.tmpdir is set to). I suppose if it's... trying to write files and you don't have write access to /tmp that might explode...

philippgille commented 1 year ago

One idea to improve this (to prevent other people from running into this): One confusing thing was all the log output before running into the "permission denied", and the /tmp directory logs. Maybe this made it look less like a clear user-side error. Proposal: Would it work to check if the --bin argument points to an executable file, and if not, only print that?

aphyr commented 1 year ago

I don't mind a more specific error message here ("permission denied" is slightly misleading; perhaps "is not a file" would be more helpful), but I'd advise against trying to predict and detect this error before actually getting there.