pypa / manylinux

Python wheels that work on any linux (almost)
MIT License
1.44k stars 218 forks source link

Tracking issue for manylinux_2_34 image #1585

Open h-vetinari opened 6 months ago

h-vetinari commented 6 months ago

As time goes by, projects start using new glibc features not present in old versions, hence requiring a newer manylinux baseline version to be defined.

For example:

Also, according to the manylinux-timeline, around ~40% of users on non-EOL Python versions have better than glibc 2.34 already.

IMO the discussion in #1282 (and the failure of Debian-based 2_24, c.f. #1012) showed comprehensively that we should stick with a RHEL-derivative also for the next manylinux standard - the devtoolset backports make all the difference w.r.t. to longevity, and all those flavours are ABI compatible.

This would also continue the pattern for all major manylinux standards so far (again excepting 2_24):

manylinux glibc RHEL
manylinux1 2.5 5
manylinux2010 2.12 6
manylinux2014 2.17 7
manylinux_2_28 2.28 8
manylinux_2_34 2.34 9

Updating the flavour table from #1282:

The big question is what base image to use. We can choose between C9S (centos:stream9), AlmaLinux, RockyLinux, or UBI. Personally I would prefer an image based entirely on open source software. [...]

Obviously Alma (as the base for 2_28) is a strong candidate, though since then, RHEL has announced it will not provide its sources anymore, creating some doubts initially how Alma will continue, though in the end, they decided they have everything they need to keep going. They'll keep ABI-compatibility (which is by far the most important), but will drop bug-for-bug compatibility (which is IMO not a big deal for us). Still, this might influence the decision; at the time of #1282, the main argument against using UBI was the lack of a gcc-toolset, but this has since been added (both to UBI 8 & 9).

base image EOL¹ gcc-toolset-13² aarch64 ppc64le s390x bug fix availability
CentOS Stream 9 2027-05-31³ yes yes³ yes³ yes³ 1st
UBI 9 2032-05-31 yes⁴ yes⁵ yes⁵ yes⁵ 2nd
AlmaLinux 9 2032-05-31 yes yes⁶ yes⁶ yes⁶ 3rd
RockyLinux 9 2032-05-31 yes yes⁷ yes⁷ yes⁷ 3rd

1: including Maintenance Support, https://access.redhat.com/support/policy/updates/errata/#Life_Cycle_Dates 2: https://pkgs.org/download/gcc-toolset-13 3: https://www.centos.org/centos-stream/ 4: at least gcc 12, probably on gcc 13 already 5: https://hub.docker.com/r/redhat/ubi9-minimal/tags 6: https://github.com/AlmaLinux/cloud-images / https://hub.docker.com/r/almalinux/9-minimal 7: https://hub.docker.com/r/rockylinux/rockylinux/tags


The checklist is short, since wheel and pip, which both use packaging, and warehouse do not require additional PRs.

Publisher-side Support:

Additional projects to double-check for support, perhaps these are already done?

mayeut commented 6 months ago

Thanks for the tracking issue.

manylinux_2_34 is already supported by auditwheel so there's no need for a new version: https://github.com/pypa/auditwheel/pull/405

While UBI added support for gcc-toolset, it might still not support some -devel packages so, IMHO, it's fine to use AlmaLinux for now as is done for manylinux_2_28.

They'll keep ABI-compatibility (which is by far the most important), but will drop bug-for-bug compatibility (which is IMO not a big deal for us).

Indeed, we only aim for ABI-compatibility.

snnn commented 5 months ago

showed comprehensively that we should stick with a RHEL-derivative I hold a different opinion on this and just left a few comments in the original issues to explain why. Mainly,

  1. RHEL is not free. Neither the derivations can provide enough security updates for free.
  2. From supply chain point of view you would not want to bind to a single supplier

I would suggest not going further on this road. Instead, give the existing users enough time to transit to an alternative solution. For example, as you said half of the pypi users already have glibc>=2.34, which means they also have a decent GCC compiler and very new libstdc++, which dims the need of using devtoolset when time goes by.

h-vetinari commented 5 months ago

From supply chain point of view you would not want to bind to a single supplier

So far all relevant[^1] manylinux images have been based on RHEL-derivatives, and this system has worked very well. Red Hat itself - though a single vendor - is about as far as possible from a supply chain risk as any single company can be.

Your proposal in #1012 is not feasible (in my opinion). Your argument against UBI in #1282 may be correct, but then we can still stay with alma (which will fix security issues also in the rpms).

[^1]: I'm deliberately not counting 2_24.

For example, as you said half of the pypi users already have glibc>=2.34, which means they also have a decent GCC compiler and very new libstdc++, which dims the need of using devtoolset when time goes by.

Try telling maintainers that they should halve their user base. Said differently, a lot of projects care about providing support that is a broad as possible across many different uses (many of which get stuck on old distros for whatever reason), and raising their glibc baseline is done extremely conservatively. This creates an obvious tension between needing to support old systems while wanting to use +/- contemporary compilers. The only sustainable solution to this over the relevant lifecycles has been the devtoolset.

snnn commented 5 months ago

Functionally the existing Alma based images work very well, but if you run a security scanner to see if any of the components in the images needs be patched, it will be a different story. So my concern is more from supply chain point of view. If the toolchains you use has a known vulnerability, it could be get used to inject a backdoor in the package you are building. We do not see a reliable way to get security patches for the RHEL-derivatives, mainly because Red Hat doesn't want to provide them for free.

dralley commented 5 months ago

RHEL is not free. Neither the derivations can provide enough security updates for free.

RHEL Universal Base Image is free (and redistributable), fwiw.

https://catalog.redhat.com/software/base-images

I believe many of the downsides of CentOS Stream were addressed since the previous discussion also, although the 5 year lifecycle remains. For that reason either UBI or Alma images would (I assume) probably be preferred.

snnn commented 5 months ago

Yes, the image is free. But anything users installed to the image are not covered. manylinux_2_34 needs to install GCC compiler and a lot of other software from dnf repos. These software may or may not be able to get security updates, that's my biggest concern. For example, GCC depends on GMP. The gmp package in UBI's repo has a known security vulnerability. The UBI8 base image is well patched, but the packages hosted in UBI8's official repos are not. If manylinux doesn't need to manually install any RPM package(i.e. an official UBI image can provide everything we need), I would be less concerned.

dralley commented 5 months ago

@carlwgeorge ^ can you speak to those concerns

carlwgeorge commented 5 months ago

Your argument against UBI in https://github.com/pypa/manylinux/issues/1282 may be correct,

They're not, please see my other reply for the full breakdown.

We do not see a reliable way to get security patches for the RHEL-derivatives, mainly because Red Hat doesn't want to provide them for free.

For packages in the UBI content set, Red Hat does provide them, for free, under freely redistributable terms. If something needs to be added to that content set for UBI to be a viable solution, I'm happy to advocate for that.

Yes, the image is free. But anything users installed to the image are not covered. manylinux_2_34 needs to install GCC compiler and a lot of other software from dnf repos. These software may or may not be able to get security updates, that's my biggest concern. For example, GCC depends on GMP. The gmp package in UBI's repo has a known security vulnerability. The UBI8 base image is well patched, but the packages hosted in UBI8's official repos are not. If manylinux doesn't need to manually install any RPM package(i.e. an official UBI image can provide everything we need), I would be less concerned.

This is all bogus, for the reasons I described in my other reply. For that GMP security vulnerability, do you know who has that patched? CentOS Stream 8, RHEL 8.8 EUS, and RHEL 8.6 EUS. That's it. RHEL 8.9, Alma 8, and Rocky 8 don't have it. Somehow that fix snuck out in the UBI 8 image (not the repos) ahead of the RHEL 8.10 release, which is likely a mistake but not evidence of UBI repo package not being maintained.

snnn commented 5 months ago

Great to know that. Thanks!

ncoghlan commented 3 months ago

From a docs PoV, it would be helpful to have a concise list of the key distro versions shipping glibc 2.34 or later (similar to the list I am suggesting to add for manylinux_2_28 and manylinux2014 in https://github.com/pypa/manylinux/pull/1635 )

For glibc 2.34, that would be:

mayeut commented 3 months ago

I'd like to wait for gcc-toolset-14 for this one. In any case, given the CI state of things, we need to wait for either #1619 or #1629.

h-vetinari commented 2 months ago

If musl 1.1 was already at 2.5% (of musllinux!) downloads ~2 months ago, is it really necessary/beneficial to wait until November to remove it?

Independently of that, I don't see why the release of other 2_34 images would necessarily need to wait for musllinux; those could come later as constraints get resolved.

EwoutH commented 2 months ago

Nice milestone: glibc 2.34 is now supported over 50% of consumer systems using Python.

image

glibc 2.35 is also supported on more than half of the systems, probably since Ubuntu 22.04 has glibc 2.35 bundled by default (and has a huge marketshare).

EwoutH commented 1 month ago

I'd like to wait for gcc-toolset-14 for this one.

Seems like something is in the pipeline: https://gitlab.com/redhat/centos-stream/rpms/gcc-toolset-14-gcc

Where would a published version appear?

carlwgeorge commented 1 month ago

CentOS Stream 9 has gcc-toolset-14 available for installation right now.

root@c9-container:~# dnf info gcc-toolset-14-gcc
Last metadata expiration check: 0:03:03 ago on Mon Aug 26 16:35:18 2024.
Available Packages
Name         : gcc-toolset-14-gcc
Version      : 14.2.1
Release      : 1.1.el9
Architecture : x86_64
Size         : 47 M
Source       : gcc-toolset-14-gcc-14.2.1-1.1.el9.src.rpm
Repository   : appstream
Summary      : GCC version 14
URL          : http://gcc.gnu.org
License      : GPL-3.0-or-later AND LGPL-3.0-or-later AND (GPL-3.0-or-later WITH GCC-exception-3.1) AND (GPL-3.0-or-later WITH Texinfo-exception) AND
             : (LGPL-2.1-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH GNU-compiler-exception) AND
             : BSL-1.0 AND GFDL-1.3-or-later AND Linux-man-pages-copyleft-2-para AND SunPro AND BSD-1-Clause AND BSD-2-Clause AND BSD-2-Clause-Views AND
             : BSD-3-Clause AND BSD-4-Clause AND BSD-Source-Code AND Zlib AND MIT AND Apache-2.0 AND (Apache-2.0 WITH LLVM-Exception) AND ZPL-2.1 AND ISC AND
             : LicenseRef-Fedora-Public-Domain AND HP-1986 AND curl AND Martin-Birgmeier AND HPND-Markus-Kuhn AND dtoa AND SMLNJ AND AMD-newlib AND OAR AND
             : HPND-merchantability-variant AND HPND-Intel
Description  : The gcc-toolset-14-gcc package contains the GNU Compiler Collection
             : version 14.