Cannot build arm64 arch on macOS / M1 host

kyr0 commented 2 years ago

Version

16.10.0

Platform

Darwin dev 20.6.0 Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:20 PDT 2021; root:xnu-7195.141.6~3/RELEASE_ARM64_T8101 arm64

Subsystem

No response

What steps will reproduce the bug?

./configure --dest-cpu=arm64
make

How often does it reproduce? Is there a required condition?

Always; deterministic

What is the expected behavior?

Main binary and dependent libraries are working and output arm64 on lipo -info call

What do you see instead?

  g++ -o /Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj.host/v8_base_without_compiler/deps/v8/src/heap/base/asm/x64/push_registers_asm.o ../deps/v8/src/heap/base/asm/x64/push_registers_asm.cc '-D_GLIBCXX_USE_CXX11_ABI=1' '-DV8_GYP_BUILD' '-DV8_TYPED_ARRAY_MAX_SIZE_IN_HEAP=64' '-D_DARWIN_USE_64_BIT_INODE=1' '-DOPENSSL_NO_PINSHARED' '-DOPENSSL_THREADS' '-DV8_TARGET_ARCH_ARM64' '-DV8_HAVE_TARGET_OS' '-DV8_TARGET_OS_MACOSX' '-DV8_EMBEDDER_STRING="-node.14"' '-DENABLE_DISASSEMBLER' '-DV8_PROMISE_INTERNAL_FIELD_COUNT=1' '-DENABLE_MINOR_MC' '-DOBJECT_PRINT' '-DV8_INTL_SUPPORT' '-DV8_ENABLE_LAZY_SOURCE_POSITIONS' '-DV8_USE_SIPHASH' '-DDISABLE_UNTRUSTED_CODE_MITIGATIONS' '-DV8_WIN64_UNWINDING_INFO' '-DV8_ENABLE_REGEXP_INTERPRETER_THREADED_DISPATCH' '-DV8_SNAPSHOT_COMPRESSION' '-DV8_ENABLE_SYSTEM_INSTRUMENTATION' '-DV8_ENABLE_WEBASSEMBLY' '-DV8_ALLOCATION_FOLDING' '-DV8_ALLOCATION_SITE_TRACKING' '-DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_STATIC' '-DV8_ADVANCED_BIGINT_ALGORITHMS' '-DUCONFIG_NO_SERVICE=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=1' '-DUCONFIG_NO_BREAK_ITERATION=0' -I../deps/v8 -I../deps/v8/include -I/Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj/gen/inspector-generated-output-root -I../deps/v8/third_party/inspector_protocol -I/Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj/gen -I/Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj/gen/generate-bytecode-output-root -I../deps/icu-small/source/common -I../deps/icu-small/source/i18n -I../deps/icu-small/source/tools/toolutil -I../deps/v8/third_party/zlib -I../deps/v8/third_party/zlib/google  -O3 -gdwarf-2 -fstrict-aliasing -mmacosx-version-min=10.13 -Wall -Wendif-labels -W -Wno-unused-parameter -std=gnu++14 -stdlib=libc++ -fno-rtti -fno-exceptions -fno-strict-aliasing -MMD -MF /Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/.deps//Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj.host/v8_base_without_compiler/deps/v8/src/heap/base/asm/x64/push_registers_asm.o.d.raw   -c
<inline asm>:4:8: error: unknown token in expression
  push %rbp
       ^
<inline asm>:4:8: error: invalid operand
  push %rbp
       ^
<inline asm>:5:7: error: unknown token in expression
  mov %rsp, %rbp
      ^
<inline asm>:5:7: error: invalid operand
  mov %rsp, %rbp
      ^
<inline asm>:6:3: error: unrecognized instruction mnemonic, did you mean: ushl, ushr?
  push $0xCDCDCD
  ^
<inline asm>:7:8: error: unknown token in expression
  push %rbx
       ^
<inline asm>:7:8: error: invalid operand
  push %rbx
       ^
<inline asm>:8:8: error: unknown token in expression
  push %r12
       ^
<inline asm>:8:8: error: invalid operand
  push %r12
       ^
<inline asm>:9:8: error: unknown token in expression
  push %r13
       ^
<inline asm>:9:8: error: invalid operand
  push %r13
       ^
<inline asm>:10:8: error: unknown token in expression
  push %r14
       ^
<inline asm>:10:8: error: invalid operand
  push %r14
       ^
<inline asm>:11:8: error: unknown token in expression
  push %r15
       ^
<inline asm>:11:8: error: invalid operand
  push %r15
       ^
<inline asm>:12:7: error: unknown token in expression
  mov %rdx, %r8
      ^
<inline asm>:12:7: error: invalid operand
  mov %rdx, %r8
      ^
<inline asm>:13:7: error: unknown token in expression
  mov %rsp, %rdx
      ^
<inline asm>:13:7: error: invalid operand
  mov %rsp, %rdx
      ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]

Additional information

v8 seems to be trying to build src/heap/base/asm/x64 instead of using inline assembly implementation src/heap/base/asm/arm64 which is available.

How to switch the target arch when compiling Node.js?

I'm embedding Node.js in a C++ application which needs to be as lightweight as possible. Therefore I'm trying to get a non-universal build containing only arm64 binary, also for the dependent libs (libuv, v8 etc).

Am I overlooking something? I thought using --dest-cpu was quite the go-to flag...

I'd also be happy about any comments helping me with this (hacks for the build scripts etc.)

targos commented 2 years ago

That's weird. Looking at the output, it seems that our configure script detected the host arch as x64 and enabled cross-compilation: g++ -o /Users/aron/iPlugDev/iPlug2/Dependencies/node/out/Release/obj.host/...

targos commented 2 years ago

@kyr0 could you please share the contents of config.gypi that is generated on your machine?

kyr0 commented 2 years ago

@targos Sorry for the delay, I've been busy with my day job ;) Here goes the content of the config.gypi. Also, your assumptions seem to be correct from what I can see: The script determines x86 as the host architecture which is def. not correct in reality. However, I now have an idea why this happens:

My whole Mac has been migrated instead of freshly installed. This means that I've had homebrew installed on my Intel Mac before and installed the automake / autoconf toolchain before, running on the Intel machine; I then ran the migration tool and now I might be running a toolchain that has been cross-compiled by Rosetta2. I'm not sure about this however. It might also be the default case with a fresh install on M1 under other circumstances and not be such of an edge case. Well, def. lipo -info /usr/bin/make gives me Architectures in the fat file: /usr/bin/make are: x86_64 arm64e.

That said, depending on the implementation of host_arch_cc() which is used in configure.py to determine the host arch (maybe its just checking the underlaying binary format to check the host architecture in a sense of "I'm x86 so the host must be x86 too!"?), it must come to the wrong conclusion in my case. It might be safer in general when something like uname -a or some system/toolchain call would be used on unixoid host OSs than trying to assume it from the binary format the compiler toolchain is running on which can be false-positive in cases like Rosetta2. Wdyt?

I'll triage in a few hours again, try to proof my point, fix it on my side and come up with a PR to fix this. However, introducing changes in this line host_arch = host_arch_win() if os.name == 'nt' else host_arch_cc() could become a potential landmine on other host that I cannot test. Any change here should undergo a carefully test procedure imho; idk if we have integration tests for this, but that would be amazing... :)

Anyways, here is my local config.gypi: https://pastebin.com/9asbsHNP

kyr0 commented 2 years ago

I think I found the bug. def host_arch_cc()

It prints:

k[i] __x86_64__ i 1
rtn x64

Which leads to:

host_arch: x64
target_arch: arm64

This all boils down to: k = cc_macros(os.environ.get('CC_host'))

Update: CC_host is None. This leads to CC being executed, which defaults to the cc executable in path which is:

Arons-MBP:node-16.10.0 aron$ which cc
/usr/bin/cc

And which is itself dual x86_64 arm64e:

Arons-MBP:node-16.10.0 aron$ file /usr/bin/cc
/usr/bin/cc: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/cc (for architecture x86_64):  Mach-O 64-bit executable x86_64
/usr/bin/cc (for architecture arm64e):  Mach-O 64-bit executable arm64e

And it probably selects the first architecture it finds from that one... investigating further

kyr0 commented 2 years ago

Guess I found it... the output of that parsed cc -dM -E - call on M1 is ['#define _LP64 1',...]

The architecture _LP64 is not part of the mapping:

matchup = {
    '__aarch64__' : 'arm64',
    '__arm__'     : 'arm',
    '__i386__'    : 'ia32',
    '__MIPSEL__'  : 'mipsel',
    '__mips__'    : 'mips',
    '__PPC64__'   : 'ppc64',
    '__PPC__'     : 'ppc64',
    '__x86_64__'  : 'x64',
    '__s390x__'   : 's390x',
    '__riscv'     : 'riscv',
  }

Which ultimately leads to the wrong determination of the host architecture.

Adding: '_LP64' : 'arm64', `

Will fix this nasty bug and lead to the correct result: host_arch = arm64

Would you accept a PR @targos ?

Best, Aron

richardlau commented 2 years ago

I'm not really a macOS user but @AshCripps set up our fat binary builds https://github.com/nodejs/node/blob/7f2cb44d89ce47af19b4a1ba69b9d673fff1e321/Makefile#L981-L1055. I believe this is all run on a M1 mac outside of Rosetta 2.

targos commented 2 years ago

I do most of my nodejs work with an M1 MBP and it works fine for me, so this is not a general issue

AshCripps commented 2 years ago

Even if you do get the x64 build to run I believe it fails on a V8 compilation step which is why we do arm64 -> x64 and not the other way round in our release builds.

kyr0 commented 2 years ago

I do most of my nodejs work with an M1 MBP and it works fine for me, so this is not a general issue

Yes, I came to the same conclusion now. This only happens in dual-arch toolchain cases where Rosetta 2 emulates an x86_64 platform environment.

Summing up, with the new patch in PR, it works well in this edge case as well. Compilation results in:

Arons-MBP:node-16.10.0 aron$ lipo -info out/Release/libv8_init.a
Non-fat file: out/Release/libv8_init.a is architecture: arm64
Arons-MBP:node-16.10.0 aron$ lipo -info out/Release/libnode.a
Non-fat file: out/Release/libnode.a is architecture: arm64
Arons-MBP:node-16.10.0 aron$ lipo -info out/Release/libuv.a
Non-fat file: out/Release/libuv.a is architecture: arm64
Arons-MBP:node-16.10.0 aron$

kyr0 commented 2 years ago

Even if you do get the x64 build to run I believe it fails on a V8 compilation step which is why we do arm64 -> x64 and not the other way round in our release builds.

@AshCripps When I was not specifying the --dest-cpu=arm64(without my patch) it went down the x86_64 road and built with target_arch = x86_64 and host_arch = x86_64 which resulted in a (from the new understanding) surprising result of a valid x86_64 build. I can try to double-check and verify this if you're interested.

Only specifying --dest-cpu=arm64(without my patch) led to target_arch = arm64 while host_arch was still wrongly identified as x86_64. This resulted in an arch mismatch triggering the cross-compilation and setting x86_64 for v8 -- it then of course broke the compilation because v8 tried to compile x86_64 inline assembly on an ARM cpu.

The current behaviour (with my patch) is: You don't have to specify any --dest-cpu or anything; I just run ./configure && make -- it will detect the host_arch correctly, even in this edge-case situation, and give me non-fat libs and executables having only arm64 as a result

kyr0 commented 2 years ago

@AshCripps I'm actually running all permutations I can imagine now... just to re-verify; a) without the patch, no --dest-cpu flag; but with the edge-case of Rosetta 2 on the toolchain... update: x86_64, non-fat b) without the patch, with --dest-cpu=arm64 (should lead to the error above): update: crash: tries to build inline assembly for x86_64 (v8) just as above c) with the patch, no --dest-cpu flag; should lead to the expected result of a non-fat arm64 build; update: arm64 non-fat d) with the patch, and --dest-cpu flag set to arm64; should have no effect; equal to c); update: equal to c) e) with the patch, and --dest-cpu flag set to x86_64; that will be interesting; update: crashes; tries to build openssl with inline asm for x86_64

richardlau commented 2 years ago

a) without the patch, no --dest-cpu flag; but with the edge-case of Rosetta 2 on the toolchain...

I'd kind of expect a Rosetta 2 toolchain to build x64 binaries -- isn't the point of Rosetta 2 to pretend to be x64?

kyr0 commented 2 years ago

@richardlau Well, that is a valid argument from my point of view; actually, I think both mental models could work; one could argue that Rosetta 2s intention is to make x86_64 work on arm64, therefore the output of a compilation should be arm64 by default. Or one could argue that Rosetta runs x86_64 on arm and therefore the output should be x86_64. Hmm...

But even if we take the second mental model, and assume that when running via Rosetta, the compilation output should be x86_64, then it would be nice if the compilation wouldn't crash when you specify the --dest-cpu target arm64, right?

To make this work, my patch would need another condition to check for --dest-cpu as well

Alternatively, the PR and issue could be closed, because it's an edge case; but then I'd suggest to document it at least in the build readme for Mac / M1. I could also come up with a small PR for that.

Wdyt?

richardlau commented 2 years ago

But even if we take the second mental model, and assume that when running via Rosetta, the compilation output should be x86_64, then it would be nice if the compilation wouldn't crash when you specify the --dest-cpu target arm64, right?

Yeah, for this I would expect the build to be cross-compiling for arm64 (but with a x64 host toolchain). If that doesn't work (I think @AshCripps may have alluded to it not working when we tried before) then there's possibly a bug somewhere in the build scripts but it's unlikely to be configure. It may even be an issue upstream in V8.

AshCripps commented 2 years ago

I think this is the wrong way to solve the issue - as far as I understand you are trying to compile arm64 from within an x64 environment (rosetta) right? To me the fix should involve the --dest-cpu flag and not fudging the detection to detect the host underneath rosetta - cause that could lead to awkwardness when trying to build x64 inside rosetta. In fact this will break our release builds.

djmarcin commented 2 years ago

Even if you do get the x64 build to run I believe it fails on a V8 compilation step which is why we do arm64 -> x64 and not the other way round in our release builds.

This is what I found as well, but it seems to be because V8 is being compiled with CC_host but passes -DV8_TARGET_ARCH_ARM64 which corresponds to the --dest-cpu=arm64 flag. It seems like this should be fixable, but this is kind of a tangent so I'll open a separate issue -- https://github.com/nodejs/node/issues/40350

nodejs / node