ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Program failure when using OCaml + M1 Mac Intel Emulator #13096

Open rbwilson-maker opened 1 month ago

rbwilson-maker commented 1 month ago

I encountered a strange bug which prevents me from running my OCaml code. My system: Chip: Apple M1 Pro Opam VersionMach-O 64-bit executable x86_64 OCaml Version: Mach-O 64-bit executable x86_64

Opam was installed using brew, which for some reason was also an intel version.

When I run my OCaml program (it’s a large program, a compiler) it sometimes hangs with this error:

rosetta error: unexpectedly need to EmulateForward on a synchronous exception x86_rip=0x4378862352
arm_pc=0x4390014496 num_insts=4 inst_index=3 x86 instruction bytes: 0x6215344901283465301 0x18125098399031709779

This bug was introduced into my code only after I added this segment of code: I am using the Jane Street Core library

let reachable = Hash_set.create (module Symbol) in
let rec explore_block (block : Symbol.t) : unit = 
  Hash_set.add reachable block;
  let block_data = Hashtbl.find_exn cfg.graph block in
  match block_data.succ with
  | `None -> ()
  | `One l -> explore_block l
  | `Two (l1, l2) -> explore_block l1; explore_block l2
in
explore_block cfg.root;
Hashtbl.filter_keys_inplace cfg.graph ~f:(Hash_set.mem reachable);

When I remove this block of code, the issue no longer appears. I have encountered this bug in no other scenarios, only this one.

tmcgilchrist commented 1 month ago

Thanks for the report @rbwilson-maker

What version of macOS, opam and OCaml are you using? Running these commands in a Terminal will give you that information.

$ sw_vers
$ opam --version
$ ocaml --version
$ opam list |grep core

Are you able to provide a minimal program using that code which will trigger the behaviour? Is there a reason why you are using opam via rosetta? A native arm64 version of ocaml/opam is available.

These issues https://github.com/golang/go/issues/42700 and https://github.com/dotnet/runtime/issues/44958 suggest it might be caused by Rosetta bugs and/or interactions with signals.

tmcgilchrist commented 1 month ago

Potentially related to https://github.com/ocaml/ocaml/issues/12766 both issues need further investigation.

rbwilson-maker commented 1 month ago

@tmcgilchrist Running those commands gives the following:

$ sw_vers
ProductName:    macOS
ProductVersion: 12.6
BuildVersion:   21G115
$ opam —version
2.1.5
$ ocaml —version
The OCaml toplevel, version 4.12.1
$ opam list |grep core
core                       v0.14.1     Industrial strength alternative to OCaml's standard library
core_kernel                v0.14.2     Industrial strength alternative to OCaml's standard library

The reason I am using rosetta is an accident. I installed opam with homebrew and for some reason my version of brew is Homebrew 4.2.3 with config:

HOMEBREW_VERSION: 4.2.3
ORIGIN: https://github.com/Homebrew/brew
HEAD: b3751bca8c4a3c76107d5aa3b75b81db93cf4bb6
Last commit: 3 months ago
Core tap JSON: 11 Jan 21:40 UTC
Core cask tap JSON: 11 Jan 21:40 UTC
HOMEBREW_PREFIX: /usr/local
HOMEBREW_REPOSITORY: /usr/local/Homebrew
HOMEBREW_CELLAR: /usr/local/Cellar
HOMEBREW_CASK_OPTS: []
HOMEBREW_DISPLAY: /private/tmp/com.apple.launchd.dJYcHX4pEu/org.macosforge.xquartz:0
HOMEBREW_MAKE_JOBS: 8
Homebrew Ruby: 3.1.4 => /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/3.1.4/bin/ruby
CPU: octa-core 64-bit westmere
Clang: 14.0.0 build 1400
Git: 2.39.0 => /usr/local/bin/git
Curl: 7.79.1 => /usr/bin/curl
macOS: 12.6-x86_64
CLT: 14.0.0.0.1.1661618636
Xcode: N/A
Rosetta 2: true

I suspect this was from porting over my old computer to this M1 Mac. I was unaware I was using this version of brew until I encountered this bug. Reinstalling the arm64 version of everything is plausible but seems dangerous for me at this point since I don’t want to ruin my environment while I still have many things to get done.

As for a minimal program. I don’t have time to experiment right now, but can work on that later. All I know is that the segment of code above, when removed from my program, no longer causes the bug, and when placed in or rewritten in a few ways with the same logic, always causes the bug. So I have definitely isolated it to that segment of code.

rbwilson-maker commented 1 month ago

The same issue was found with the language Julia here

tmcgilchrist commented 1 month ago

Could you try creating an opam switch with 4.14.1 and re-running the code on that?

# Create an empty 4.14 switch
$ opam switch create 4.14.1 --no-install
# Enable that switch in your current shell
$ eval $(opam env --switch="4.14.1" --set-switch)
# Then whatever commands you're using to install dependencies.
# You might need to install specific versions like `opam install core.v0.14.1` to match your current versions.

Unfortunately I don't currently have a Mac setup with a rosetta version of brew/opam/ocaml to debug this. Is the compiler project available as open source possibly?

Doing some research suggests there are rosetta bug fixes that might fix this issue if you're able to update.

rbwilson-maker commented 4 weeks ago

The compiler will hopefully be made open source when it’s done in about 2 weeks then I can return to testing this. Thanks for the help, i’ll try the opam switch, and potential rosetta update. Are you able to link any of the particular rosetta bug fixes you were looking at?