indygreg / python-build-standalone

Produce redistributable builds of Python
Mozilla Public License 2.0
1.98k stars 127 forks source link

Fix missing symbol error when running older versions of MacOS #122

Closed ddeville closed 2 years ago

ddeville commented 2 years ago

Python (at least 3.8) has trouble dealing with weak symbols that are defined in macos availability macros and ends up linking them strongly which causes it to fail launching on older OSes since it can't find the symbols at runtime. I've noticed issues on 10.10 and 10.11 due to symbols added in 10.12 (mostly getentropy) but this PR attempts to deal with this in a slightly more generic manner.

I know that Python has been attempting to fix this and maybe 3.9 doesn't have that issue but I haven't looked too closely.

indygreg commented 2 years ago

I'll look at the diff later, but my recollection is CPython didn't do weak linking properly until a 3.9 point release. I believe things were finally implemented somewhat properly when Apple M1s became a thing and they were forced to use the macOS 11.0 SDK to target M1 and this exposed them to portability issues when x86-64 binaries were built with the 11.0 SDK.

It took them a while to do weak linking properly, as there were definitely random symbols not doing it properly. And my recollection is some initial attempts at introducing weak linking didn't actually work properly. I think that's why there is code in this repo for force disabling specific symbols.

The code to support weak linking is scattered all over the CPython code base and much of it was backported. Since much of it was all tied up in M1 support and 3.8 didn't get those backports until much later, I wouldn't at all be surprised if 3.8 has bugs in this area.

As for this project, I very much dislike the fact that we prevent the use of some new symbols instead of using weak linking. This does have performance implications and I'd love to see us do the right thing.

A thing I struggle with is automated enforcement of symbol visibility. The Rust code in this repo for verifying some binary properties does detect some banned symbols. But it doesn't currently recognize - and allow - weakly linked symbols. I'd love to change this. I suspect there's even room to parse .tbd files in macOS SDKs to verify symbol availability and require weak linking when appropriate. I even wrote a Rust crate (https://crates.io/crates/text-stub-library) for parsing .tbd files with this potential goal in mind.

Another issue here is presence of weak symbols impacts downstream projects linking the raw object files. We might need to annotate weak symbols or other linker metadata in the PYTHON.json file so weak symbols can be handled accordingly. Otherwise, I believe weak symbols can get promoted when linking the object files. So if we introduce weak symbols, we need to be sure PyOxidizer handles them correctly.

indygreg commented 2 years ago

Oh, and something else to keep in mind is that because this project doesn't (yet) produce universal/fat Mach-O binaries, we can get away with requiring macOS 11.0+ targeting for aarch64. So we likely don't need to disable symbols on aarch64 Darwin unless they were introduced past macOS 11.0.

indygreg commented 2 years ago

I have some patches locally to look for weak references in the Mach-O files. Throwing it at the most recent distributions exposes the following weak references:

3.8 aarch64:

python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.8.dylib _fstatvfs
python/install/lib/libpython3.8.dylib _inet_aton
python/install/lib/libpython3.8.dylib _lchown
python/install/lib/libpython3.8.dylib _statvfs

3.8 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.8.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.8.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.8.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.8.dylib _clock_getres
python/install/lib/libpython3.8.dylib _clock_gettime
python/install/lib/libpython3.8.dylib _clock_settime
python/install/lib/libpython3.8.dylib _dispatch_once_f
python/install/lib/libpython3.8.dylib _faccessat
python/install/lib/libpython3.8.dylib _fchmodat
python/install/lib/libpython3.8.dylib _fchownat
python/install/lib/libpython3.8.dylib _fdopendir$INODE64
python/install/lib/libpython3.8.dylib _fstatat$INODE64
python/install/lib/libpython3.8.dylib _fstatvfs
python/install/lib/libpython3.8.dylib _getentropy
python/install/lib/libpython3.8.dylib _inet_aton
python/install/lib/libpython3.8.dylib _lchown
python/install/lib/libpython3.8.dylib _linkat
python/install/lib/libpython3.8.dylib _mkdirat
python/install/lib/libpython3.8.dylib _mkostemp
python/install/lib/libpython3.8.dylib _openat
python/install/lib/libpython3.8.dylib _readlinkat
python/install/lib/libpython3.8.dylib _renameat
python/install/lib/libpython3.8.dylib _statvfs
python/install/lib/libpython3.8.dylib _symlinkat
python/install/lib/libpython3.8.dylib _unlinkat

3.9 aarch64:

python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.9.dylib _inet_aton

3.9 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.9.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.9.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.9.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.9.dylib _clock_getres
python/install/lib/libpython3.9.dylib _clock_gettime
python/install/lib/libpython3.9.dylib _clock_settime
python/install/lib/libpython3.9.dylib _dispatch_once_f
python/install/lib/libpython3.9.dylib _faccessat
python/install/lib/libpython3.9.dylib _fchmodat
python/install/lib/libpython3.9.dylib _fchownat
python/install/lib/libpython3.9.dylib _fdopendir$INODE64
python/install/lib/libpython3.9.dylib _fstatat$INODE64
python/install/lib/libpython3.9.dylib _futimens
python/install/lib/libpython3.9.dylib _getentropy
python/install/lib/libpython3.9.dylib _inet_aton
python/install/lib/libpython3.9.dylib _linkat
python/install/lib/libpython3.9.dylib _mkdirat
python/install/lib/libpython3.9.dylib _mkostemp
python/install/lib/libpython3.9.dylib _openat
python/install/lib/libpython3.9.dylib _readlinkat
python/install/lib/libpython3.9.dylib _renameat
python/install/lib/libpython3.9.dylib _symlinkat
python/install/lib/libpython3.9.dylib _unlinkat
python/install/lib/libpython3.9.dylib _utimensat

3.10 aarch64:

python/build/Modules/socketmodule.o _inet_aton
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/install/lib/libpython3.10.dylib _inet_aton
python/install/lib/python3.10/config-3.10-darwin/libpython3.10.a:socketmodule.o _inet_aton

3.10 x86-64

python/build/lib/libffi.a:closures.o _mkostemp
python/build/lib/liblzma.a:liblzma_la-stream_encoder_mt.o _clock_gettime
python/build/lib/libtcl8.6.a:tclMacOSXNotify.o ___darwin_check_fd_set_overflow
python/build/lib/libtk8.6.a:tkMacOSXColor.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___block_descriptor_48_e8_32o_e8_v16?0q8l
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___copy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXDialog.o ___destroy_helper_block_e8_32o
python/build/lib/libtk8.6.a:tkMacOSXWindowEvent.o _NSAppearanceNameDarkAqua
python/build/lib/libtk8.6.a:tkMacOSXWm.o _NSAppearanceNameDarkAqua
python/install/lib/libpython3.10.dylib _NSAppearanceNameDarkAqua
python/install/lib/libpython3.10.dylib ___darwin_check_fd_set_overflow
python/install/lib/libpython3.10.dylib __dyld_shared_cache_contains_path
python/install/lib/libpython3.10.dylib _clock_getres
python/install/lib/libpython3.10.dylib _clock_gettime
python/install/lib/libpython3.10.dylib _clock_settime
python/install/lib/libpython3.10.dylib _dispatch_once_f
python/install/lib/libpython3.10.dylib _faccessat
python/install/lib/libpython3.10.dylib _fchmodat
python/install/lib/libpython3.10.dylib _fchownat
python/install/lib/libpython3.10.dylib _fdopendir$INODE64
python/install/lib/libpython3.10.dylib _fstatat$INODE64
python/install/lib/libpython3.10.dylib _futimens
python/install/lib/libpython3.10.dylib _getentropy
python/install/lib/libpython3.10.dylib _inet_aton
python/install/lib/libpython3.10.dylib _linkat
python/install/lib/libpython3.10.dylib _mkdirat
python/install/lib/libpython3.10.dylib _mkostemp
python/install/lib/libpython3.10.dylib _openat
python/install/lib/libpython3.10.dylib _readlinkat
python/install/lib/libpython3.10.dylib _renameat
python/install/lib/libpython3.10.dylib _symlinkat
python/install/lib/libpython3.10.dylib _unlinkat
python/install/lib/libpython3.10.dylib _utimensat

I just thought this would be an interesting reference to have. In theory, all of the weak references should correspond to symbols in macOS 10.10+ (at least for x86-64 builds) because one would think CPython would only make the symbol weak when targeting an older macOS platform level than when the symbol was introduced. However, I'm not sure if this is actually true.

indygreg commented 2 years ago

I haven't forgotten about this PR!

I've been wanting to add automation that validates macOS portability of the resulting binaries for a while now and this PR spurred me to do it. Partially because unwanted symbols have caused errors and regressions. Partially because I want to do the right thing and weakly link as much as possible so distributions run on modern macOS get the benefits the code is capable of.

Anyway, as the timeline shows I just pushed some commits that attempt to validate undefined Mach-O symbols against the state as advertised by the macOS SDKs. I think I've got non-weak symbol validation working against SDKs 10.10+. No 10.9 at the moment because 10.9 doesn't ship YAML .tbd files and we'll have to parse actual Mach-O to do the symbol presence verification. And the checks aren't yet running in CI.

The code as written doesn't show any problems with missing symbols for current distributions for all SDK versions 10.10+.

So now I just have more questions.

Critically, why are you getting undefined symbols but my validation isn't saying they are missing? Either you are doing something causing an introduction of the non-weak symbols and/or my validation code isn't correct.

Could you please provide more details to help debug this?

Which symbol(s) are undefined? And what are steps to reproduce the undefined symbol errors?

ddeville commented 2 years ago

Thanks for the update!

I have a small binary built with pyoxidizer that packages some pip dependencies and mostly acts as Python. When launching this executable on 10.10 and 10.11 I would get a linking error because of getentropy:

dyld: lazy symbol binding failed: Symbol not found: _getentropy
  Referenced from: /Users/vagrant/Desktop/pybox-2022-03-28-1648505173-darwin-x86_64/./pybox
  Expected in: /usr/lib/libSystem.B.dylib

Looking at the symbols, it's undefined but doesn't seem to be weakly linked, which would explain the above.

osx-10-10-a19b635d:pybox-2022-03-28-1648505173-darwin-x86_64 vagrant$ nm pybox | grep getentropy
000000010117a5e8 d __ZN3std3sys4unix4rand3imp21getentropy_fill_bytes5DLSYM17h08e5edb48162f7ffE
                 U _getentropy
0000000101197c84 b _py_getentropy.getentropy_works

(the py_getentropy.getentropy_works seems to imply that it has the right runtime detection but the symbol being strongly linked means that it can't even load the executable)

Let me try to first put together a small repro case so that I can also try it against your latest commits.

ddeville commented 2 years ago

ok here's a small repro case:

$ cat pyoxidizer.bzl
def make_exe():
    dist = default_python_distribution(python_version="3.8")
    policy = dist.make_python_packaging_policy()
    python_config = dist.make_python_interpreter_config()
    python_config.config_profile = "python"
    exe = dist.to_python_executable(
        name="entronope",
        packaging_policy=policy,
        config=python_config,
    )
    return exe

register_target("exe", make_exe)
resolve_targets()

$ cat build.sh
#!/usr/bin/env bash
set -euo pipefail

pyoxidizer_url="https://github.com/indygreg/PyOxidizer/releases/download/pyoxidizer%2F0.20.0/PyOxidizer-0.20.0-exe-macos-universal.zip"
target_triple="x86_64-apple-darwin"

# This shouldn't be necessary but let's be explicit
export MACOSX_DEPLOYMENT_TARGET=10.10

mkdir -p build

pushd build
curl -L --fail $pyoxidizer_url --output pyoxidizer.zip
unzip -o pyoxidizer.zip
popd

./build/pyoxidizer build exe --release --target-triple $target_triple

When run on a 10.10 machine:

$ ./build/x86_64-apple-darwin/release/exe/entronope
dyld: lazy symbol binding failed: Symbol not found: _getentropy
  Referenced from: /Volumes/VMware Shared Folders/entronope/./build/x86_64-apple-darwin/release/exe/entronope
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _getentropy
  Referenced from: /Volumes/VMware Shared Folders/entronope/./build/x86_64-apple-darwin/release/exe/entronope
  Expected in: /usr/lib/libSystem.B.dylib

Trace/BPT trap: 5

And yep, this is a 3.8 issue, switching default_python_distribution to use Python 3.9 or 3.10 doesn't exhibit the issue anymore.

ddeville commented 2 years ago

Tried building with a standalone python 3.8 built off latest main and getting the same linking error.

indygreg commented 2 years ago

I want to say this is a bug in PyOxidizer then. But I'm the author of that project too, so we can continue discussing it here.

On my Intel MBP running and using the 12.3 SDK to build targeting 10.9, nm -mg on the resulting executable says _getentropy is weakly linked: (undefined) weak external _getentropy (from libSystem).

Do you see that same output? (nm -mg path/to/exe | grep _getentropy).

If you don't, then that's a bug in the build reproducibility of PyOxidizer. Probably something in an environment variable or config file somewhere changing defaults.

If you do see the same output, then it could be a bug in Python 3.8 code for dispatching to weak symbols. As I said before, my recollection is CPython didn't do weak linking correctly until 3.9. I believe those patches were backported to 3.8 and they may have made mistakes with the backports. e.g. there might be call sites where CPython isn't doing the runtime availability checks before calling into a symbol.

ddeville commented 2 years ago

ah you're right, I am also seeing (undefined) weak external _getentropy (from libSystem).

indygreg commented 2 years ago

Actually, I think I can save you some work: this is a bug in CPython 3.8 not doing runtime availability checks.

Compare the following:

There's no availability attributes on 3.8. Without the guard in place, it tries to resolve the weak symbol at runtime, leading to failure on older macOS versions. I'll report the issue to CPython.

ddeville commented 2 years ago

Ah! well that explains it :)

indygreg commented 2 years ago

Actually, I may not report the bug: https://github.com/python/cpython/commit/b29d0a5a7811418c0a1082ca188fd4850185e290 says that they intentionally omitted weak linking support from the 3.8 backport to reduce scope.

There are references to this decision at https://github.com/python/cpython/issues/85272#issuecomment-1093875621. There's probably more discussion floating around. But I can't make sense of things after the migration of the Python issue tracker to GitHub.

So, uh, I'm not sure where that leaves us.

My knee jerk is I think that means we need to avoid ~all weakly linked symbols coming from CPython 3.8 since they lack runtime availability guards?

Note there does appear to be runtime checking for _dyld_shared_cache_contains_path, ffi_closure_free, ffi_closure_alloc, and ffi_prep_closure_loc. Although since we provide our own libffi on macOS, those ffi symbols may not matter since USING_APPLE_OS_LIBFFI shouldn't be defined.

ddeville commented 2 years ago

Ah yep I followed some of those patches from afar since a coworker was involved (mostly the work around supporting building 3.8 for arm64) but I hadn't realized that there were also discussions around back porting weak-linking fixes.

So taking a step back, I proposed this change because it's what we were already doing for preadv and pwritev and that sounded like a nice generalization of that weak-linking issue. Obviously knowing that 3.9+ doesn't have this problem anymore it's absolutely not the right approach and we should just keep things as they are for the common case.

Regarding 3.8, I guess there's always the option of calling it a day and upgrading to 3.9 but it's not always practical and also people often find themselves stuck on an older version of cpython because of needing to keep support for old OS versions, which is exactly the use case that this is breaking.

So maybe a middle ground here would be - as you said - to special case 3.8 and ignore those weakly-linked symbol. For those folks who happen to have to support an older cpython version and fairly old macos versions, I feel like not having support for potentially better newer system calls is the least bad of two non-optimal situations.

Also worth noting that while I included all symbols that could lead to similar issues (lazily picked from an internal codebase that had to deal with that over the years), getentropy is really the only one that I've had issues with. I should go check whether the other ones actually do have availability guards but we might also not have to ignore a large set of symbols.

ddeville commented 2 years ago

Just saw dc87b17, excellent!

indygreg commented 2 years ago

In my mind 3.8 is the backwards compatible release supporting older machines. So changing just its build configuration to forego modern macOS features/symbols in the name of runtime compatibility is the way to go.

This commit I just pushed to main (which you commented on earlier) does just this.

It is unfortunate we have to deprive >95% of users with access to the symbols from using them. But I suppose that is the price we have to pay if we want to support such old macOS.

If any user is affected by this and really wants the symbols, they can upgrade to CPython 3.9 or 3.10, which continue to support macOS 10.9 on x86-64 and AFAIK don't have issues using weak symbols since CPython added runtime guards to all their weakly linked symbols.

Anyway, with 52b35f9 landing on main, I suppose there's no longer a need for this PR. So closing it.

Thank you for helping me get to the bottom of this. What a thorny bug!

indygreg commented 2 years ago

Oh, if you need this in a release, I might do a release over the weekend. We're a little bit behind on upgrading dependencies and I try to having this project track upstream pretty closely, as I know a growing number of projects are relying on these builds.

ddeville commented 2 years ago

Thank you! I learned a bunch along the way so that was also educative :)