mesonbuild / meson-python

Meson PEP 517 Python build backend
https://mesonbuild.com/meson-python/
MIT License
120 stars 59 forks source link

Better handling of the `patchelf` dependency #485

Closed dnicolodi closed 10 months ago

dnicolodi commented 10 months ago

meson-python needs patchelf to manipulate the RPATH of executables [1], shared libraries, and extension modules when a shared library is shipped as part of the Python package on Linux. patchelf is not a Pythoin package, but a PyPI package for patchelf exists, similarly as for ninja. Currently the meson-python declares a dependency on the patchelf package if patchelf is not found on the system and if the package being built is not a pure Python package

https://github.com/mesonbuild/meson-python/blob/6c720bbae702b9fbb90af0434dcdb042d1d14084/mesonpy/__init__.py#L1046-L1058

To check if the package being built is pure, meson-python checks whether there are any files distributed in platlib or if any of the distributed scripts is a native executable:

https://github.com/mesonbuild/meson-python/blob/6c720bbae702b9fbb90af0434dcdb042d1d14084/mesonpy/__init__.py#L336-L347

There is no check for whether any shared library is distributed with the package. This causes patchelf to be requested when is not actually needed for most packages built with meson-python (I think that distributing a shared library with the package is somehow rare).

Another, IMO more important, issue is that the first check requires the meson project to be configured and the second requires the project to be also built. These operations happen in a temporary build directory, thus, to actually build the wheel, the project is configured (and sometimes built) twice.

The rationale behind this appears to be the observation that it is cheaper to configure (and eventually build) the project twice than installing a binary wheel that may not be used. I think that this compromise is the wrong one. patchelf is a 450 kB binary wheel and pip nowadays ships with caching enabled by default, thus the cost of installing patchelf is small and mostly a one time thing (I assume downloading 450 kB dominates the cost, and that unpacking the wheel into the temporary build environment is negligible compared to other things required to build the wheel), on the other hand, configuring the meson project is a cost paid for every version of package built. The observation that no one has yet complained of the fact that meson-python requires to install patchelf for simple packages that do not ship shared libraries seems to corroborate this thesis.

I propose to always require patchelf on Linux when a system provided patchelf is not found.

This will affect only projects that use meson-python to build pure Python wheels. I suspect these to be extremely rare, meson-python itself being the exception. And these are also the kind of projects for which providing wheels on PyPI is extremely easy, thus I expect the global cost of this change to be minimal. On the other hand, the global savings for most of the other projects that don't need to configure the meson project twice will be significant.

By the way, does anyone know if there is a way to query PyPI for all the projects that declare a build dependency on meson-python?

[1] The oldest bug report against meson-python #3 actually claims that the mechanism does not work for executables. But many things changed since then and I haven't verified if this is the case.

dnicolodi commented 10 months ago

Also, as noted here https://github.com/mesonbuild/meson-python/blob/6c720bbae702b9fbb90af0434dcdb042d1d14084/mesonpy/__init__.py#L1056-L1057 meson since version 1.1.0 supports a "none" backend that basically allows only to configure a project and install files, see https://github.com/mesonbuild/meson/pull/10779

It don't have a detailed plan for how that could be integrated into meson-python but it could be a solution for pure Python packages to drop the dependency on ninja and patchelf. The easiest would probably be to add a new option:

[tool.meson-python]
backend = 'none'

that would drop the requirement for ninja and patchelf, on Linux, and would configure the project with

meson setup $builddir --backend=none
dnicolodi commented 10 months ago

I propose to always require patchelf on Linux when a system provided patchelf is not found.

An alternative I considered is to drop the detection of the need for patchelf and require the projects that need it to add it to their build-system.requires, but I don't know how that could be made conditional to the presence of patchelf on the system.

rgommers commented 10 months ago

Probably a good idea to revisit this indeed. I haven't noticed the double configure step, because I never use a system where it's not already installed. GitHub Actions Ubuntu images have it installed as well.

An alternative I considered is to drop the detection of the need for patchelf and require the projects that need it to add it to their build-system.requires,

I think this is the better alternative of the two you proposed. It limits the impact to the few projects that actually need this, rather than imposing an extra transitive dependency on every project.

but I don't know how that could be made conditional to the presence of patchelf on the system.

It can't, but that may still be fine. At that point it's a minor optimization that's lost for the packages that do need it. The gain for the many packages that don't need it seems more important.

dnicolodi commented 10 months ago

I don't like much the solution of having the packages that need patchelf list it in their build-system.requires for two reasons: it feels wrong to have to install a Python package for something that can (and probably is better to) be provided by the distribution, and most importantly, it break backwards compatibility.

As you say, most systems already come with patchelf installed, and the cost of unpacking it into the build environment is really minimal. We have been imposing this cost on everybody already (except the projects that use meson-python for pure Python projects, of which I thing meson-python itself is the only example), basically since day zero. What I'm proposing to do is to continue to do that, but do not require the additional meson setup run that is required now.

eli-schwartz commented 10 months ago

If you're going to require every package that has a binary extension which patchelf could be reasonably run on, add something to their pyproject.toml to say so, maybe it would make sense to have them add [tool.meson-python] patchelf = true instead of [build-system] requires = ['patchelf'].

I do think that given a choice between running the full build in order to figure out whether patchelf is needed, or just assuming the build will have compiled extensions, it's so much simpler and more reliable to simply say: "meson-python depends on patchelf, no exceptions, and if you don't have your own then we will download one for you". But the "none" backend would be neat to support and part of my reason was for situations like this.

Anecdotally: Arch Linux, Fedora, Gentoo, and Void Linux all have their package unconditionally depend on patchelf Debian has patchelf as a Recommends. Alpine Linux doesn't depend on it at all, neither in meson-python nor in scipy, so I wonder how that works.

dnicolodi commented 10 months ago

@eli-schwartz There may be a misunderstanding: patchelf is needed only if the package distributes a shared library (and relies in the meson-python facilities for handling the installation location and the rpath) and something else that links to it. I don't think there are many packages doing this. However, we now run at least meson setup on the package to determine if it includes any compiled artifact and add a requirement for patchelf if it does.

This has two drawbacks: patchelf is required in many cases when it is not needed, and it requires running meson setup twice. On packages that distribute anything that ends up in the scripts path in the wheel it also requires a full compilation. What I propose to do is to assume that the vast majority of packages using meson-python distribute extensions and skip the meson setup required for determining this, and always request the patchelf package if a patchelf executable is not detected on the system. This maintains the status quo of requiring patchelf for packages that don't need it, but removes the need for running an extra meson setup.

dnicolodi commented 10 months ago

Adding a configuration knob is of course a possibility, but configuration knobs add complexity for all users, and requiring to set a configuration knob is a compatibility break. I think it all boils down to how onerous we assume it is to install a 500 kB package on systems that do not have patchelf installed.

rgommers commented 10 months ago

I think it all boils down to how onerous we assume it is to install a 500 kB package on systems that do not have patchelf installed.

I think it is a regression - in particular because it's now in every Python environment, instead of only once system-wide. It's the same as meson and meson-python both not depending on ninja from PyPI.

I will also note that in the current situation, we haven't received a single complaint yet (AFAIK). And for those cases where someone would notice, the solution is dead simple: install patchelf with your distro package manager, and the problem is gone.

it feels wrong to have to install a Python package for something that can (and probably is better to) be provided by the distribution

Exactly:) With https://peps.python.org/pep-0725/ it could be better in the future, it'd be something like:

[external]
dependencies = [
    "pkg:generic/ninja",
    "pkg:generic/patchelf; platform_system=='Linux'",
]

in our pyproject.toml.

dnicolodi commented 10 months ago

I think it is a regression - in particular because it's now in every Python environment, instead of only once system-wide. It's the same as meson and meson-python both not depending on ninja from PyPI.

I think I was clear on this point. Probably I was not. Of course we would request the patchelf package only if we don't find a suitable patchelf binary on the system, like we do now.

I will also note that in the current situation, we haven't received a single complaint yet (AFAIK). And for those cases where someone would notice, the solution is dead simple: install patchelf with your distro package manager, and the problem is gone.

I start to think that the initial explanation was too long an has not been read carefully. There is no such problem: we currently request patchelf also when not required.

With https://peps.python.org/pep-0725/ it could be better in the future, it'd be something like:

Without diminishing the work that went into this PEP, the PEP says that it does not concern itself with what the consumers of the information specified in that format do with it. Until this is not hashed out, it is not useful to solve the issue at hand.

rgommers commented 10 months ago

Of course we would request the patchelf package only if we don't find a suitable patchelf binary on the system, like we do now.

Oh okay, I think I lost that in the long-ish thread, sorry about that. This is probably a patch of only a few lines long, deleting the is_pure check? If so, maybe open a PR for your preferred solution?

rgommers commented 10 months ago

Until this is not hashed out, it is not useful to solve the issue at hand.

Sure, it was more a conceptual comment. It's going to take quite a while before that gets accepted and then implemented in packaging tools.