ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.63k stars 409 forks source link

Random "Permission denied" errors on macOS ARM64 #4482

Open christoph-cullmann opened 3 years ago

christoph-cullmann commented 3 years ago

We try to compile e.g. ounit2 but get random errors on macOS ARM64, all works fine on e.g. macOS x86-64 or Linux/Windows.

Expected Behavior

Compile works.

Actual Behavior

Random compile errors with

Running[55]: (cd _build/default && /makefactory/usr/20210413/191623/debug/macos64/bin/ocamlc.opt -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -bin-annot -I src/lib/ounit2/advanced/.oUnitAdvanced.objs/byte -I /makefactory/usr/20210413/191623/debug/macos64/lib/ocaml/bytes -I /makefactory/usr/20210413/191623/debug/macos64/lib/ocaml/stdlib-shims -no-alias-deps -opaque -o src/lib/ounit2/advanced/.oUnitAdvanced.objs/byte/oUnitRunnerProcesses.cmo -c -impl src/lib/ounit2/advanced/oUnitRunnerProcesses.ml) Error: Permission denied

This is the output with

'dune' 'build' '-j2' '--profile=dev' '--display=verbose' '-p' 'ounit2'

Reproduction

Unfortunately that is very sporadic. After a compile failed, this error is persistent in the same compile tree/build dir.

Specifications

I use OCaml 4.12.0 and dune 2.8.5 on macOS ARM64. Latest macOS release on ARM64 with all patches.

avsm commented 3 years ago

If the error is persistent, could you inspect the output file in _build and see what its permissions are?

christoph-cullmann commented 3 years ago

The .cmo and similar output files seem to have only read permissions r--r--r-- in that case.

christoph-cullmann commented 3 years ago

I updated now to the current master version of dune.

With the current master version the issue seems to occur less often, still I get stuff like

Running[54]: (cd _build/default && /makefactory/usr/20210518/145501/release/macos64/bin/ocamlc.opt -w -40 -g -bin-annot -I src/lib/ounit2/advanced/.oUnitAdvanced.objs/byte -I /makefactory/usr/20210518/145501/release/macos64/lib/ocaml/bytes -I /makefactory/usr/20210518/145501/release/macos64/lib/ocaml/stdlib-shims -no-alias-deps -o src/lib/ounit2/advanced/.oUnitAdvanced.objs/byte/oUnitRunnerProcesses.cmo -c -impl src/lib/ounit2/advanced/oUnitRunnerProcesses.ml) Error: Permission denied -> required by _build/default/src/lib/ounit2/threads/.oUnitThreads.objs/native/oUnitThreads__.cmx -> required by _build/install/default/lib/ounit2/threads/oUnitThreads__.cmx -> required by _build/default/ounit2.install -> required by alias install

christoph-cullmann commented 3 years ago

I still get these errors with latest master.

Is there a way to get a more meaningful error message that includes where the actual permission error happened?

In the CI I just see:

Actual targets:
- recursive alias @install
Error: Permission denied
-> required by _build/default/src/lib/ounit2/advanced/oUnitAdvanced.a
-> required by _build/install/default/lib/ounit2/advanced/oUnitAdvanced.a
-> required by _build/default/ounit2.install
-> required by alias install
bschommer commented 3 years ago

The issue still persists. We enabled backtraces and it seems that the problem arises during digest computing:

Actual targets:
- recursive alias @install
Error: Permission denied
Raised by primitive operation at Stdlib in file "stdlib.ml" (inlined), line
  473, characters 0-64
Called from Stdlib__digest.file in file "digest.ml", line 41, characters
  11-22
Called from Stdune__digest.file_with_executable_bit in file
  "otherlibs/stdune-unstable/digest.ml", line 81, characters 23-32
Called from Dune_engine__cached_digest.refresh_exn in file
  "src/dune_engine/cached_digest.ml", line 154, characters 15-46
Called from Fiber.O.(>>|).(fun) in file "src/fiber/fiber.ml", line 288,
  characters 36-41
Called from Fiber.Execution_context.run_jobs in file "src/fiber/fiber.ml",
  line 204, characters 8-13
-> required by _build/default/src/sexplib0.a
-> required by _build/install/default/lib/sexplib0/sexplib0.a
-> required by _build/default/sexplib0.install
-> required by alias install
christoph-cullmann commented 3 years ago

With the latest main branch variant of dune the errors got more verbose:

Running[1]: (cd src && /makefactory/usr/20211111/162801/release/macos64/bin/ocaml -I +compiler-libs /Volumes/makefactory/ocaml_stdlib_shims-CX75LdZt/ocaml-stdlib-shims/src/_build/.dune/default/src/dune.ml)
Actual targets:
- alias @@default
File "_build/.dune/default/src/dune", line 2, characters 0-87:
2 | (library
3 |  (wrapped false)
4 |  (name stdlib_shims)
5 |  (modules )
6 |  (public_name stdlib-shims))
Error: File unavailable:
/makefactory/usr/20211111/162801/release/macos64/bin/ocamlc.opt
Sys_error("Permission denied")
Raised at Stdune__user_error.raise in file
  "otherlibs/stdune-unstable/user_error.ml", line 10, characters 2-49
Called from Fiber.O.(>>|).(fun) in file "src/fiber/fiber.ml", line 288,
  characters 36-41
Called from Fiber.Execution_context.run_jobs in file "src/fiber/fiber.ml",
  line 204, characters 8-13
File "_build/.dune/default/src/dune", line 2, characters 0-87:
2 | (library
3 |  (wrapped false)
4 |  (name stdlib_shims)
5 |  (modules )
6 |  (public_name stdlib-shims))
Error: File unavailable:
/makefactory/usr/20211111/162801/release/macos64/bin/ocamlopt.opt
Sys_error("Permission denied")
Raised at Stdune__user_error.raise in file
  "otherlibs/stdune-unstable/user_error.ml", line 10, characters 2-49
Called from Fiber.O.(>>|).(fun) in file "src/fiber/fiber.ml", line 288,
  characters 36-41
Called from Fiber.Execution_context.run_jobs in file "src/fiber/fiber.ml",
  line 204, characters 8-13

/makefactory/usr/20211111/162801/release/macos64/bin/ocamlopt.opt is located on a NFS share. We tried SMB in the past, too, with similar errors. Interesting enough, no other tool has issues to execute anything from there. Any ideas what this could be? Is there some extra permission check done or plain exec/spawn/...?

rgrinberg commented 1 year ago

@bschommer @christoph-cullmann could one of you try this patch?

diff --git a/src/dune_engine/execution_parameters.ml b/src/dune_engine/execution_parameters.ml
index 5f197ee61..9ecfeadc2 100644
--- a/src/dune_engine/execution_parameters.ml
+++ b/src/dune_engine/execution_parameters.ml
@@ -96,8 +96,7 @@ let set_add_workspace_root_to_build_path_prefix_map x t =

 let dune_version t = t.dune_version

-let should_remove_write_permissions_on_generated_files t =
-  t.dune_version >= (2, 4)
+let should_remove_write_permissions_on_generated_files _ = false

 let expand_aliases_in_sandbox t = t.expand_aliases_in_sandbox

Any ideas what this could be?

The error comes from trying to read the digest of the rule's target after making them readonly.

christoph-cullmann commented 1 year ago

I can give this some try. But will take some time.

christoph-cullmann commented 1 year ago

Read again the bug, our main remaining issue is that the execution of stuff be it from nfs or smb fails randomly for dune and we 'solved' that by copying the ocaml toolchain to the local disk, see e.g.

https://github.com/ocaml/dune/issues/4482#issuecomment-966416067

I don't see how that change could help with that, but perhaps I misunderstand the change.