python-wheel-build / fromager

Build your own wheels
https://fromager.readthedocs.io/en/latest/
Apache License 2.0
7 stars 11 forks source link

Add package settings to modify install dependencies `Requires-Dist` #484

Open tiran opened 1 month ago

tiran commented 1 month ago

In PR #407 we added a feature to modify the build time dependencies of a package. In a few cases, we also want to modify the installation time dependencies (Requires-Dist) of a wheel. Sometimes we want to remove a dependency, sometimes we want to change the version constraint of a package. At the moment, we are using patches for the task. Patches costly to maintain and often conflict when a package is updated.

Every build system has one or more options to store build dependencies. Some use pyproject.toml, some read from a requirements.txt, some packages create dependencies in the fly in their setup.py. The easiest approach to modify install requires is to rewrite METADATA file of the wheel.

Proposal

Add two new fields update_dist_requires and remove_dist_requires to ProjectOverride. The values are a list[Requirement].

When a package settings as either update or remove dist requires, then

Example from InstructLab's METADATA:

Requires-Dist: xdg-base-dirs>=6.0.1
Requires-Dist: psutil>=6.0.0
Requires-Dist: mlx<0.6.0,>=0.5.1; sys_platform == "darwin" and platform_machine == "arm64"
Provides-Extra: cpu
Provides-Extra: cuda
Requires-Dist: instructlab-training[cuda]>=0.5.0; extra == "cuda"
Provides-Extra: hpu
Requires-Dist: optimum>=1.21.0; extra == "hpu"
dhellmann commented 1 month ago

Modifying the wheel would result in publishing binaries that don't match the source code, which might cause us some issues downstream. Is there a generic way to patch the source? Not everything uses pyproject.toml, so maybe not?

tiran commented 1 month ago

There is no generic way. There isn't even a generic way for pyproject.toml-only projects. It depends on the build backend.

Setuptools uses project.dependencies, project.optional-dependencies, or tool.setuptools.dynamic. Dynamic dependencies are read from a file. flit, PDM, and other backends have their own ways to deal with dynamic dependencies. Poetry does some really ... poetic things. And projects with setup.py run arbitrary code to define their installation dependencies. Our internal dependencies are all over the place. Some check for available hardware and PyTorch build flags to assemble requirements.

I agree that it would be better if we could patch the dist requirements in the sources. But that's at least a magnitude more complex and requires a ton of special code. Our platlib wheels already depend on information that is not in the the sdist anyway, e.g. env vars from the builder container, installed packages, and package settings. We could include the package settings yaml in the metadata to document the modifications.

dhellmann commented 1 month ago

We should look at how to include build documentation separately from this change. We have multiple settings files that could influence a given build, but we should be able to aggregate all of the information.

tiran commented 3 weeks ago

If we decide to move forward with this approach, then we also have update the Dynamic field (iff present), https://packaging.python.org/en/latest/specifications/core-metadata/#dynamic-multiple-use