indygreg / python-build-standalone

Produce redistributable builds of Python
Mozilla Public License 2.0
1.97k stars 125 forks source link

Kernel-dependent features like `os.pidfd_open()` #193

Open achimnol opened 12 months ago

achimnol commented 12 months ago

Recent Python versions add optional features such as os.pidfd_open() when it is available during the build time. This is going to be the default implementation of asyncio's child process watcher as of Python 3.12, with a fallback to the thread-based legacy implementation.

It seems that the current Python 3.11 distribution in this repo does not have os.pidfd_open(). While this is not a critical regression because most libraries depending on it has a good fallback, but I'd like to be able to adopt such new kernel features.

I'm not sure how the current release process could handle this. Maybe we need to have multiple different builds like the x86_64 v2/v3/... CPU generations, by several Linux kernel versions. I'm afraid that this would incur too much burden for the build infra.

I'm just reporting this issue as a future reference, though.

indygreg commented 12 months ago

The way things currently work is that we build the CPython distributions against very old kernel headers and glibc to ensure maximum binary portability by targeting a very old, ~universally supported syscall + glibc API surface.

CPython currently uses compile time checks for features like pidfd_open() support. Literally a #if defined(__linux__) && defined(__NR_pidfd_open) in C source code. So if the feature isn't present at compile time, you don't get it at run time.

On macOS, CPython is using Mach-O weak symbols and run-time checks allow us to reference macOS SDK APIs that aren't available in all machines. If they are present at run-time, CPython can use them. If they aren't, the features depending on them aren't available. This is ideal from an end-user perspective because it doesn't penalize the common user running on modern macOS by depriving them of newer features.

On Linux, ELF has support for weak symbols/linking. This is conceptually similar to Mach-O's similar feature. Usually you add a compiler #pragma or similar preprocessor directive to indicate a symbol is weakly linked. If the symbol resolves at run-time, the symbol/function address is non-0 and you can call it.

Unfortunately, I don't believe CPython has any support for weak symbols on Linux/ELF. So getting run-time conditional features isn't trivially achievable. (I'd have to page this in my brain but I want to say there are some practical limitations of weak symbols on ELF that may make their use non-viable. Even if there are, there are similar features like IFUNC that could potentially be employed for dynamic run-time dispatch support.)

That's a long way of saying that features like pidfd_open() currently require separate build variants to work. That leads to an explosion of build variants targeting various Linux + glibc feature levels. That becomes unwieldy very fast.

It might be worth engaging upstream CPython about supporting weak symbols, IFUNC, or similar run-time dynamic feature detection on Linux/ELF like they do on macOS. This would allow pre-built CPython binaries to have better performance and features versus what is achievable today.

@gpshead is this worth a discussion in a CPython forum? If so, which one?

achimnol commented 12 months ago

Thanks for the detailed explanation! I don't expect this issue could be fixed in the near future, but appreciate that we could start a discussion on addition of weak symbol support in Linux.

achimnol commented 11 months ago

Just a note for others: The motivation of my question/request is that when distributing Python-based apps to non-controllable "enterprise" environments (e.g., air-gapped customer-owned clusters), we need to be able to enable/disable such kernel-specific features based on the runtime availability without rebuilding everything for each customer site.