python / cpython

The Python programming language
https://www.python.org
Other
63.34k stars 30.32k forks source link

Error compiling HACL* Blake2 support for macOS universal binaries #123748

Closed freakboy3742 closed 1 month ago

freakboy3742 commented 1 month ago

Bug report

Bug description:

99108 tracks the addition of a native HACL implementation to CPython. #119316 added an implementation of Blake2 to hashlib.

This compiles fine on single architecture macOS builds (as verified by CI); but universal2 builds running on an ARM64 laptop generate a compilation error:

To reproduce the problem: on a macOS machine, configure the build with:

$ configure --enable-universalsdk="`xcrun --show-sdk-path`" --with-universal-archs=universal2
$ make

This will eventually yield the compilation error:

gcc -c -I../../../Modules/_hacl -I../../../Modules/_hacl/include -D_BSD_SOURCE -D_DEFAULT_SOURCE -fno-strict-overflow -Wsign-compare -Wunreachable-code -DNDEBUG -g -O3 -Wall -arch arm64 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk    -fstack-protector-strong -std=c11 -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden  -I../../../Include/internal -I../../../Include/internal/mimalloc -IObjects -IInclude -IPython -I. -I../../../Include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk     -mavx2 -DHACL_CAN_COMPILE_VEC256 -o Modules/_hacl/Hacl_Hash_Blake2b_Simd256.o ../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c
In file included from ../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:26:
In file included from ../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:40:
../../../Modules/_hacl/internal/../Hacl_Hash_Blake2b_Simd256.h:56:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *fst;
  ^
../../../Modules/_hacl/internal/../Hacl_Hash_Blake2b_Simd256.h:57:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *snd;
  ^
In file included from ../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:26:
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:44:32: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
Hacl_Hash_Blake2b_Simd256_init(Lib_IntVector_Intrinsics_vec256 *hash, uint32_t kk, uint32_t nn);
                               ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:49:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *wv,
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:50:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *hash,
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:59:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *wv,
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:60:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *hash,
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:71:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *hash
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:76:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *st,
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:83:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *st
  ^
../../../Modules/_hacl/internal/Hacl_Hash_Blake2b_Simd256.h:86:1: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
Lib_IntVector_Intrinsics_vec256 *Hacl_Hash_Blake2b_Simd256_malloc_with_key(void);
^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:34:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *wv,
  ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:35:3: error: unknown type name 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 *hash,
  ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:53:3: error: use of undeclared identifier 'Lib_IntVector_Intrinsics_vec256'
  Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_zero;
  ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:72:3: error: use of undeclared identifier 'mask'
  mask =
  ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:73:5: error: call to undeclared function 'Lib_IntVector_Intrinsics_vec256_load64s'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    Lib_IntVector_Intrinsics_vec256_load64s(FStar_UInt128_uint128_to_uint64(totlen),
    ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:77:33: error: use of undeclared identifier 'Lib_IntVector_Intrinsics_vec256'; did you mean 'Lib_IntVector_Intrinsics_vec256_load64s'?
  memcpy(wv, hash, 4U * sizeof (Lib_IntVector_Intrinsics_vec256));
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                Lib_IntVector_Intrinsics_vec256_load64s
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/secure/_string.h:63:33: note: expanded from macro 'memcpy'
                __builtin___memcpy_chk (dest, __VA_ARGS__, __darwin_obsz0 (dest))
                                              ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:73:5: note: 'Lib_IntVector_Intrinsics_vec256_load64s' declared here
    Lib_IntVector_Intrinsics_vec256_load64s(FStar_UInt128_uint128_to_uint64(totlen),
    ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:78:3: error: use of undeclared identifier 'Lib_IntVector_Intrinsics_vec256'; did you mean 'Lib_IntVector_Intrinsics_vec256_load64s'?
  Lib_IntVector_Intrinsics_vec256 *wv3 = wv + 3U;
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Lib_IntVector_Intrinsics_vec256_load64s
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:73:5: note: 'Lib_IntVector_Intrinsics_vec256_load64s' declared here
    Lib_IntVector_Intrinsics_vec256_load64s(FStar_UInt128_uint128_to_uint64(totlen),
    ^
../../../Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c:78:36: error: use of undeclared identifier 'wv3'
  Lib_IntVector_Intrinsics_vec256 *wv3 = wv + 3U;
                                   ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [Modules/_hacl/Hacl_Hash_Blake2b_Simd256.o] Error 1

From what I can make out, the error comes from the detection of -mavx2 support. On a bare configure on an ARM64 machine, -mavx2 support is apparently unsupported:

configure:30537: checking whether C compiler accepts -mavx2
configure:30557: gcc -c  -Werror -mavx2  conftest.c >&5
clang: error: argument unused during compilation: '-mavx2' [-Werror,-Wunused-command-line-argument]

and as a result, the Hacl_Hash_Blake2b_Simd256.c module isn't compiled. However, when universal support is enabled, -mavx2 is supported:

configure:30537: checking whether C compiler accepts -mavx2
configure:30557: gcc -c -arch arm64 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk  -Werror -mavx2 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk  conftest.c >&5
configure:30557: $? = 0
configure:30566: result: yes

and the module is included. Based on recent configure logs for x86_64 macOS builds, it appears that -mavx2 is supported on x86_64.

I'm not sufficiently familiar with the subject matter to comment on whether the fix here is to fix the autoconf detection to disable the problematic module on universal builds, or to correct the implementation so that it can compile for universal builds.

Tagging @msprotz @R1kM as the authors of the recent HACL* changes.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

ned-deily commented 1 month ago

macOS universal builds are a special form of cross-compile builds that take advantage of the built-in support in Apple's Xcode or Command Line Tools for Xcode for automatic multi-architecture building and linking into multi-arch fat binaries. One potential gotcha of this approach is when there are architecture-dependent configure tests (in configure) that depend on the execution of test code on the build machine. Of course, standard cross-compile builds on macOS and other platforms have the same gotcha. For macOS universal builds, this kind of a problem has occasionally arisen in the past and one solution that was adopted was to provide an additional header file, Include/pymacconfig.h, whose purpose is to move "some of the autoconf magic to compile-time when building on macOS", so overriding problematic configure-time build tests with conditional code. If it turns out that this is that case here (and it seems likely that it would be), it may be possible to replace a problematic autoconf test with code in this include file.

Note, that this is a potential release blocker issue as the macOS installer binaries we provide with each release are built as a universal2 build. The first 3.14 release, 3.14.0a1, is currently scheduled for 2024-10-15.

msprotz commented 1 month ago

Right, except that here it's made even more complicated by the fact that some files should be included in the Intel builds but not the ARM builds. I'm not sure how to achieve that except to special-case the list of files that go into the build with an Apple-specific bit of logic...?

ned-deily commented 1 month ago

What kind of files and at what point are they used: configuring, building, installing, executing?

msprotz commented 1 month ago

The files Modules/_hacl/Hacl_Hash_Blake2s_Simd128.c and Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c should only be compiled on x64/x86. They are then linked into the python executable. They may be used at runtime if the CPU that python is then executed on has the right support.

So, in more detail:

Let me know if I can provide more details? Thanks!

freakboy3742 commented 1 month ago

The files Modules/_hacl/Hacl_Hash_Blake2s_Simd128.c and Modules/_hacl/Hacl_Hash_Blake2b_Simd256.c should only be compiled on x64/x86. They are then linked into the python executable. They may be used at runtime if the CPU that python is then executed on has the right support.

Based on this, it sounds like there's no reasonable prospect of getting the Simd128 and Simd256 files to compile on ARM64. AFAIK it's not possible to compile a file for one architecture and link into a universal build - I might be wrong on that, but if I am, I'm going to guess the configure script is going to be messy.

On that basis, it sounds like universal builds won't be able to support those options. That's easy enough to override in the configure process - an extra if block checking for universal on Darwin can disable the option.

I agree it's a little weird that a pure x86-64 build will have support when ARM and universal won't, but given the new platform that the majority of macOS users are on won't support it, I don't think many users will notice the discrepancy.

msprotz commented 1 month ago

How about putting an #ifdef in those files that makes them empty when on arm64? Would that help? Like:

#if define(HACL_CAN_COMPILE_SIMD128)
... previous contents of the file ...
#endif

knowing that HACL_CAN_COMPILE_SIMD128 is not defined on ARM64.

@gpshead might have further thoughts

freakboy3742 commented 1 month ago

How about putting an #ifdef in those files that makes them empty when on arm64? Would that help? Like:

#if define(HACL_CAN_COMPILE_SIMD128)
... previous contents of the file ...
#endif

knowing that HACL_CAN_COMPILE_SIMD128 is not defined on ARM64.

That's the thing though - universal builds are implemented as a single compilation passes, so HACL_CAN_COMPILE_SIMD128/256 is defined. These are turned on because the compiler flags that enable them are legal compiler options when x86_64 is one of the compiler targets, so the constant is defined by the single-pass configure script.

If we were going down the #define path, it would need to be based on #if defined(__APPLE__) && defined(__arm64__), or something like that. However, I think catching this at the configure level makes more sense, even though it does mean the blake2b simd128/256 implementations won't be available for universal builds running on x86_64, where they otherwise would be.

freakboy3742 commented 1 month ago

A potential fix based on an improved autoconf check: #123927

gpshead commented 1 month ago

Does macOS really require all files in a universal2 build to be the same? that seems silly. it is common to separate arch specific code into its own files.

editoral comments about their toolchain choices aside (clearly they channeled practicality vs purity there), if that is true, just making the simd files have C preprocessor checks that effectively make them empty when the aarch64 side of compilation is running makes sense. the x86_64 side of the compilation (surely there are two independent compiler passes running behind the scenes of their cc -arch x -arch y command line) will still be happy and compile useful code instead of an empty object file for those.

If we were going down the #define path, it would need to be based on #if defined(__APPLE__) && defined(__arm64__), or something like that. However, I think catching this at the configure level makes more sense, even though it does mean the blake2b simd128/256 implementations won't be available for universal builds running on x86_64, where they otherwise would be.

I'd actually prefer the #define path. A configure test cannot understand the dual compilation and would unnecessarily leave Intel performance on the table. There are a ton of Intel mac's out there and I assume they'll probably be supported until 2030. It becomes more important in the future if/when we get arch specific accelerated HACL* SHA implementations so that hashlib doesn't need OpenSSL to be performant. (blake2 is less important vs those "Standard"s)

freakboy3742 commented 1 month ago

Does macOS really require all files in a universal2 build to be the same? that seems silly. it is common to separate arch specific code into its own files.

Not unless you're compiling multiple architectures in a single pass, as CPython does. If we compiled for x86_64, then compiled for ARM64, then merged the two binaries into a universal binary, the problem wouldn't exist. However, the single-pass autoconf-based build determines the modules to be compiled, and the flags to be passed in to that compile, based on a single pass compiler check.

I'd actually prefer the #define path. A configure test cannot understand the dual compilation and would unnecessarily leave Intel performance on the table. There are a ton of Intel mac's out there and I assume they'll probably be supported until 2030. It becomes more important in the future if/when we get arch specific accelerated HACL* SHA implementations so that hashlib doesn't need OpenSSL to be performant. (blake2 is less important vs those "Standard"s)

Ok - I'll take a look and see what I can make work. My first attempt at doing this failed, but I didn't look too closely at why - I'm probably missing something obvious.

msprotz commented 1 month ago

Thanks, I agree that the pass with the #defines sounds better. This can probably be done with a stub file Hacl_Hash_Blake2b_Simd256_Universal.c that does

#if defined (...)
#include Hacl_Hash_Blake2b_Simd256.c
#endif

so as to leave the ingestion of upstream HACL* unchanged, without having to hack _hacl/refresh.sh in cpython.

gpshead commented 1 month ago

Q: do we have a macOS universal2 buildbot?

ned-deily commented 1 month ago

Q: do we have a macOS universal2 buildbot?

Not that I'm aware of. It's easy enough to add the appropriate options to CPython configure to trigger a universal2 build. What currently isn't so straightforward is the availability of universal builds of the external third-party libraries that are needed by the standard library and that are not already supplied by macOS: mainly libssl and libcrypto from OpenSSL, liblzma, libmpdec, Tk, gdbm (if GPL-licensing isn't an issue), and potentially newer versions of a few others (sqlite3, ncurses, etc). Homebrew does not provide universal builds (and its decision to use totally different install prefixes for Intel and Apple Silicon builds complicates this). MacPorts does support universal builds of most/all of these but does not provide pre-built universal binaries. At some point soon, we may be able to leverage builds needed for the macOS installer and, possibly, for iOS binaries, as well.