indygreg / python-build-standalone

Produce redistributable builds of Python
BSD 3-Clause "New" or "Revised" License
1.71k stars 107 forks source link

Apple LTO distributions may not weakly reference symbols on CPython 3.10+ #216

Closed indygreg closed 4 months ago

indygreg commented 4 months ago

When attempting to upgrade GitHub Actions runners and building MacOS SDKs past the old versions we're currently using, various builds started failing validation because our custom code for validating that all strongly bound symbols were present in target MacOS SDK versions was failing.

Strangely, the failure was only occurring on LTO optimized (either the lto or pgo+lto optimization level) builds of CPython 3.10 or newer. Not 3.8 or 3.9. Nor any non-LTO build configuration on 3.10+.

The most likely symbols to be strongly bound instead of weak are mknodat and mkfifoat. However, pretty much every CPython weakly bound symbol is affected once the MacOS SDK is upgraded. (PR #161 suggested disabling mknodat and mkfifoat globally as a workaround but it turns out this problem is beyond those two symbols.)

indygreg commented 4 months ago

I think we're running into LLVM / ld bug https://github.com/llvm/llvm-project/issues/52778, where the linker doesn't preserve weak references during LTO. This bug was introduced in LLVM 13 and fixed in LLVM 14.

Our custom LLVM toolchain doesn't (yet) ship lld and clang is picking up /usr/bin/ld as the linker.

/usr/bin/ld -v reports:

@(#)PROGRAM:ld  PROJECT:dyld-1015.7
BUILD 16:59:34 Oct  1 2023
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
will use ld-classic for: armv6 armv7 armv7s arm64_32 i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 15.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 15.0.0 (tapi-1500.0.12.3)

Those are Apple's LLVM versions. While I think Apple's LLVM 15.0 should be new enough, it is possible that /usr/bin/ld is just too old and doesn't contain the bug fix.

I produced an LLVM toolchain with lld and macOS CI seems to pass with this toolchain. So I'm moderately confidence this is the issue.

That's a pretty nasty bug for Apple to be shipping in their Xcode toolchain :/

indygreg commented 4 months ago

OK. So I triggered CI against my pre-release LLVM toolchains with LLD.

The good news is x86-64 builds appear to be handling weak symbols properly when using the bundled lld linker.

The bad news is m4 craps out when building on macOS ARM. I'm able to reproduce the issue locally. Best I can tell m4's configure just doesn't know how to handle Apple ARM.

It initially fails thechecking for socklen_t equivalent check. I added ac_cv_type_socklen_t=yes to force it to completion. Configure passes but make fails in C compilation land. Looking at configure output before and after, various checks are different depending on the presence of lld as the default linker.

Most weird. It looks like m4's configure script just doesn't know how to handle modern lld on Apple ARM hardware. This is the kind of problem I expected to find ~3.5 years ago when Apple ARM machines launched. I thought all notable software had accounted for this. But I guess m4 is a laggard?

indygreg commented 4 months ago

Ok. I switched my custom LLVM toolchains to build on the new GitHub Actions Apple ARM runners. For whatever reason LLVM's CMake is detecting the host triple as x86_64-apple-darwin23.2.0. That gets inherited as the default target triple, which breaks m4 configure because it attempts to cross compile when we're not cross compiling.

How LLVM's CMake is thinking the GitHub Actions ARM runners are x86-64 I'm not yet sure.

What a yak shave :/

indygreg commented 4 months ago

From a macos-14 ARM runner:

$ uname -a
Darwin Mac-1708658615548.local 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:25 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_VMAPPLE x86_64
$ sh llvm/cmake/config.guess
x86_64-apple-darwin23.2.0

Why this is happening I'm not sure. But at least we know where things are failing.

indygreg commented 4 months ago

I'm spinning new LLVM toolchains now.

Since the test build of the new LLVM toolchain with lld passed CI on x86-64, I'm optimistic the aarch64 builds will just work once the new LLVM toolchain is in place.

glandium commented 4 months ago

Note that that LLVM bug you reference was a LLD bug, and /usr/bin/ld is not LLD. It's a completely different linker, it's not even the ld64 referenced in the LLD bug.

indygreg commented 4 months ago

Oh right. You just reminded me that Apple implemented a new, faster linker, which they announced at WWDC in June 2023.

This makes a lot more sense. And explains why weak symbol issues only surfaced when upgrading CI runners to newer macOS versions.

For whatever reason I thought ld was lld under the hood.