ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.64k stars 409 forks source link

`dune build --watch` crashes when it detects filesystem change on macOS 13.1 on some Apple Silicon machines #6907

Closed dmzimmerman closed 1 year ago

dmzimmerman commented 1 year ago

Expected Behavior

I expected it to watch the filesystem and rebuild as necessary, without crashes.

Actual Behavior

It crashes every time the filesystem changes.

Reproduction

Start with any Dune project (I happened to use https://github.com/OCamlPro/ocaml-solidity). Open two terminals and change working directory in both to the root directory of the project. Run dune build, then dune build --watch in one (the reason for the initial build run is that if the project needs building, dune build --watch crashes immediately with the same error as you'll see on a filesystem change). In the other, change the filesystem - either by editing one of the files, or even something like touch foo to create a new file that Dune wouldn't even need to build. It should crash immediately with the message Internal error, please report upstream including the contents of _build/log and a stack trace (which you can see in the verbose gist below) that seems to indicate an as_outside_build_dir_exn being thrown.

Specifications

Additional information

I replicated this on completely clean installs of OCaml 4.14.1 and Dune via OPAM (i.e., rm -rf ~/.opam; opam init; opam switch create 4.14.1 followed by opam installation of Dune and the required libraries for the project I was building, on both a MacBook Pro/M1 Pro/32GB RAM and a Mac Studio/M1 Ultra/128GB RAM. Colleagues of mine who have tried the same on 16GB M1 or M1 Pro machines do not seem to be experiencing this issue, but I don't know if that's just a coincidence or something meaningful.

mro commented 1 year ago

same here, any news?

mro commented 1 year ago

dune.3.7.0 shows the same crash. log.txt I renamed _build/log so as the upload here works.

nojb commented 1 year ago

Just an update: I will try to take a look (probably won't happen before next week -- I need to procure an M1 mac).

nojb commented 1 year ago

I tried, but could not reproduce in a M2 Macbook Air with OCaml 4.14.1 and Dune 3.7.0. Unfortunately, it is impossible to investigate with a reproducible case.

By the way, there is another bug report involving M1 macs and the file watching mechanism in #6151. In principle they do not seem related, but I thought I would mention it just in case.

If you can set up remote access to a machine where the issue can be reproduced, then I can try to investigate, otherwise I don't have any other ideas to propose, sorry!

rgrinberg commented 1 year ago

By the way, there is another bug report involving M1 macs and the file watching mechanism in https://github.com/ocaml/dune/issues/6151. In principle they do not seem related, but I thought I would mention it just in case.

It's not related. The bug appears on every mac and I can readily reproduce it. In fact, you've pointed out the bug to me in a private slack chat before and exactly what was causing it :)

mro commented 1 year ago

If you can set up remote access

send me a ssh key to work@mro.name. At my office I am behind a NAT, so we have to schedule a time slot and I will be in a sensible network. But it will be slow. 12MBit downstream.

nojb commented 1 year ago

If you can set up remote access

send me a ssh key to work@mro.name. At my office I am behind a NAT, so we have to schedule a time slot and I will be in a sensible network. But it will be slow. 12MBit downstream.

Done, thanks. Let's see if we can coordinate and figure out what is going on...

nojb commented 1 year ago

By the way, there is another bug report involving M1 macs and the file watching mechanism in #6151. In principle they do not seem related, but I thought I would mention it just in case.

It's not related. The bug appears on every mac and I can readily reproduce it. In fact, you've pointed out the bug to me in a private slack chat before and exactly what was causing it :)

Right, thanks for the reminder :)

mro commented 1 year ago

@nojb quite typical the crash is shy and doesn't show in a fresh account.

Also it seems to go away if I purify the messy PATH in my everyday account from

$ echo $PATH /Users/mro/.opam/4.12.1/bin:/Users/mro/bin:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/go/bin:/usr/local/MacGPG2/bin:/opt/homebrew/opt/fzf/bin

to

$ echo $PATH /Users/mro/.opam/4.12.1/bin:/bin/:/usr/bin:/usr/local/bin/

If nothing jumps at you, give me some days of fiddling with the PATH to narrow it down or have a workaround.

Debugging via ssh access IMO doesn't make sense right now.

nojb commented 1 year ago

If nothing jumps at you, give me some days of fiddling with the PATH to narrow it down or have a workaround.

Nothing specific, but the dune watching mechanism will "watch" every directory in PATH (since changes in these directories may affect binaries which are used in the Dune rules), so it sounds plausible that there may be a link here.

Debugging via ssh access IMO doesn't make sense right now.

OK! We can revisit that possibility in the future as needed. Thanks.

rgrinberg commented 1 year ago

@nojb quite typical the crash is shy and doesn't show in a fresh account.

Also it seems to go away if I purify the messy PATH in my everyday account from

$ echo $PATH /Users/mro/.opam/4.12.1/bin:/Users/mro/bin:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin:/Applications/Little Snitch.app/Contents/Components:/usr/local/go/bin:/usr/local/MacGPG2/bin:/opt/homebrew/opt/fzf/bin

to

$ echo $PATH /Users/mro/.opam/4.12.1/bin:/bin/:/usr/bin:/usr/local/bin/

If nothing jumps at you, give me some days of fiddling with the PATH to narrow it down or have a workaround.

Debugging via ssh access IMO doesn't make sense right now.

What is the path of the project that you're building with dune?

mro commented 1 year ago

fish is my default shell. $ dune runtest --watch comes up, but the first edit crashes dune.

But if I launch a new terminal, start a bash, cd into my project, do eval $(opam env) the $ dune runtest --watch runs stable.

dmzimmerman commented 1 year ago

I can confirm that simplifying my $PATH in the same way as @mro also makes the crash go away for me. The path to the project that I'm building is outside all of the directories in my $PATH regardless of that simplification (it's just a folder a few levels deep in my home directory).

rgrinberg commented 1 year ago

@dmzimmerman do you have any paths in $PATH that don't exist?

dmzimmerman commented 1 year ago

I did... I'm also embarrassed to say that . was somehow in my $PATH (I have no idea what startup script put that in there, but it will be fixed momentarily). Removing . fixed it; removing the paths in $PATH that don't exist actually had no effect (I even tried putting one back after having removed ., and it didn't cause the crash again).

mro commented 1 year ago

not after cleaning, but, look at what opam env brings:

$ opam env
set -gx OPAM_SWITCH_PREFIX '/Users/mro/.opam/4.12.1';
set -gx CAML_LD_LIBRARY_PATH '/Users/mro/.opam/4.12.1/lib/stublibs:/Users/mro/.opam/4.12.1/lib/ocaml/stublibs:/Users/mro/.opam/4.12.1/lib/ocaml';
set -gx OCAML_TOPLEVEL_PATH '/Users/mro/.opam/4.12.1/lib/toplevel';
set -gx PKG_CONFIG_PATH '/Users/mro/.opam/4.12.1/lib/pkgconfig';
set -gx PATH ':/Users/mro/.opam/4.12.1/bin' '/opt/homebrew/bin' '/bin' '/sbin' '/usr/bin' '/usr/sbin' '/usr/local/bin' '/usr/local/MacGPG2/bin' '/Users/mro/bin';

the leading colon in $PATH sets the ., obviously.

After setting $PATH without . - 🎉 No Crash!

$ echo $SHELL
/opt/homebrew/bin/fish
$ dune --version
3.7.0
$ opam --version
2.0.7
$ opam switch
#  switch  compiler                    description
   4.10.2  ocaml-base-compiler.4.10.2  4.10.2
→  4.12.1  ocaml-base-compiler.4.12.1  4.12.1
   4.14.0  ocaml-base-compiler.4.14.0  4.14.0
   5.0.0   ocaml-base-compiler.5.0.0   5.0.0
$ uname -mprsv
Darwin 20.6.0 Darwin Kernel Version 20.6.0: Fri Dec 16 00:34:59 PST 2022; root:xnu-7195.141.49~1/RELEASE_ARM64_T8101 arm64 arm
mro commented 1 year ago

a workaround:

$ eval (opam env | sed -e "s/':/'/g;s/'\.'//g")
nojb commented 1 year ago

It is probably a good idea to robustify Dune by eg only watching absolute directories in $PATH.

nojb commented 1 year ago

Should we close this?

Alizter commented 1 year ago

I can add a patch that checks for dirs in PATH being absolute and then close.

nojb commented 1 year ago

It is probably a good idea to robustify Dune by eg only watching absolute directories in $PATH.

OK, sounds good!

Alizter commented 1 year ago

Well not so sure anymore. Here is what we are doing:

let parse_path ?(sep = path_sep) s =
  String.split s ~on:sep
  |> List.filter_map ~f:(function
       | "" -> None
       | p -> Some (Path.of_filename_relative_to_initial_cwd p))

The assumption is that if there is a relative path in PATH then we treat it relative to the current working directory. Is this sane behaviour or should we safely enforce absolute paths?

I would like to understand why we have a crash if the path does not exist.

nojb commented 1 year ago

Is this sane behaviour or should we safely enforce absolute paths?

Independently of the present issue, I don't think treating it relative to the current working directory is a good idea; it makes the "watched set" of directories depend on where you are when you launch Dune. So I would filter out relative paths from PATH. This is a heuristic of course, but I think it will be correct most of the time.

I would like to understand why we have a crash if the path does not exist.

Indeed, this sounds like it should be handled in the Fsevents bindings. As a data point, in the Windows file watching backend, the existence of the path is checked, and if it doesn't exist it is ignored. (To be clear, I haven't checked if the FSEvents bindings do this check or not, I am just guessing.) cc @rgrinberg

Alizter commented 1 year ago

I've made a patch to ignore relative paths but I don't think we should close this issue until we sort out the possible issue with Fsevents.

rgrinberg commented 1 year ago

I don't think it's a good idea to ignore relative paths in PATH. If they are interpreted by the rest of the system and not dune, that makes dune inconsistent and poorly behaved.

rgrinberg commented 1 year ago

Do you mind sending a test for this issue first? I think it's just a matter of including CWD in PATH and trying to build.

Alizter commented 1 year ago

@rgrinberg I am not able to reproduce on Linux. I did something along the lines of our watching tests:

Building with relative paths in the PATH env var.

  $ . ./helpers.sh

  $ cat > dune-project << EOF
  > (lang dune 3.8)
  > EOF
  $ touch foo.opam

  $ cat > a.ml << EOF
  > let () = print_endline "Hello, world!"
  > EOF

  $ cat > new-contents << EOF
  > 
  > let () = print_endline "Goodbye, world!"
  > EOF

  $ cat > dune << EOF
  > (library
  >  (name a)
  >  (public_name foo.a))
  > EOF

  $ export PATH=./doesnt-exist/:$PATH

  $ start_dune

  $ build foo.install
  Success

  $ cat new-contents > b.ml

  $ cat > a.ml << EOF
  > include B
  > 
  > let () = print_endline "Hello, world!"
  > EOF

  $ build foo.install
  Success

  $ with_timeout dune shutdown
  $ cat .#dune-output
  Success, waiting for filesystem changes...
  Success, waiting for filesystem changes...
$ dune exec -- ./a.exe

If I understood the issue correctly, the second build should fail.

Alizter commented 1 year ago

@dmzimmerman If you are able to make a cram test that you can reproduce on apple silicon that would be much appreciated. It would help us understand what the real issue is here.

mro commented 1 year ago

@Alizter in my case it was unrelated to apple silicon but rather to the leading colon in the fish $PATH with my 3-years old opam version.

So IMO close

Alizter commented 1 year ago

@mro I see. The leading colon in the PATH variable is still something we ought to be resilient against however.

dmzimmerman commented 1 year ago

I'm not exactly sure how to make a cram test for this, because it requires environment variables to be set in a specific way, so it'd be more like a shell script... but boiling it down to its essence, if I run, in some directory:

git clone https://github.com/OCamlPro/ocaml-solidity
cd ocaml-solidity
export PATH=/opt/homebrew/bin:/usr/bin:.
eval $(opam env)
dune build --watch

the crash happens every time. This is with OPAM installed via Homebrew, and my default switch (4.14.1), but it doesn't matter which switch I use, it still crashes. The choice of ocaml-solidity isn't anything special, it just happens to be an OCaml project that I'm working with that has a dune configuration; it happens with every other project I've tried.