pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.53k stars 3.03k forks source link

PEP 518 build requirements cannot be overriden by user #4582

Open ghost opened 7 years ago

ghost commented 7 years ago

Apparently not, it seems to call pip install --ignore-installed .... Because the build itself is not be isolated from the environment in other respects, I'm not sure if this is actually sensible behavior by pip...

If the target computer already has a satisfactory version of numpy, then the build system should use that version. Only if the version is not already installed should pip use an isolated environment.

Related: scipy/scipy#7309

rgommers commented 4 years ago

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem.

+1 this is a healthy idea in general, and I don't see serious downsides.

Note that for numpy specifically, we try to teach people good habits, and there's a package oldest-supported-numpy that people can depend on in pyproject.toml. But many people new to shipping a package on PyPI won't be aware of that.

pradyunsg commented 2 years ago

Something like the situations discussed here has happened today -- setuptools has started rejecting invalid metadata and users affected by this have no easy workarounds.

@jaraco posted #10669, with the following design for a solution.

I imagine a solution in which pip offers options to extend and constrain build dependencies at install time. Something like:

--build-requires=<dependencies or file:requirements>
--build-constraints=<constraints or file:constraints>

These additional requirements would apply to all builds during the installation. To limit the scope of the specifications, it should also allow for a limited scope:

--build-requires=<project>:<dependencies or file:requirements>
--build-constraints=<project>:<constraints or file:constraints>

For a concrete example, consider a build where setuptools<59 is needed for django-hijack and setuptools_hacks.distutils_workaround is needed for all projects and the deps in scipy-deps.txt is required for mynumpy-proj:

pip install --use-pep517 --build-constraints "django-hijack:setuptools<59" --build-requires "setuptools_hacks.distutils_workaround" --build-requires "mynumpy-proj:file:scipy-deps.txt"

The same specification should be able to be supplied through environment variables.

uranusjr commented 2 years ago

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem.

Some more thoughts I’ve had during the past year on this idea. Choosing a build dependency matching the runtime one is the easy part; the difficult part is the runtime dependency version may change during resolution, i.e. when backtracking happens. And when that happens, pip will need to also change the build dependency, because there’s no guarantee the newly changed runtime dependency has ABI compatibility with the old. And here’s where the fun part begins. By changing the build dependency, pip will need to rebuild that source distribution, and since there’s no guarantee the rebuild will have the same metadata as the previous build, the resolver must treats the two builds as different candidates. This creates a weird these-are-the-same-except-not-really problem that’s much worse than PEP 508 direct URL, since those builds likely have the same name, version (these two are easy), source URL (!) and wheel tags (!!) It’s theoratically all possible to implement, but the logic would need a ton of work.

I imagine a solution in which pip offers options to extend and constrain build dependencies at install time.

And to come back to the “change the build dependency” thing. There are fundamentally two cases where an sdist’s build dependencies need to be overridden:

  1. The dependencies are declared that can arguably be considered “correct”, but I want the resolver to interpret it more smartly. This is the case for the ABI compatibility use case and I think there are better solutions for that.
  2. The dependencies are just declared wrong and I need to change it to something else (e.g. add or remove a dependency, make the version range wider). This kind of use case is fundamentally the same as #8076 but for build dependencies, and I think the same logic applies. IMO allowing for direct dependency overriding is too heavy-handed a solution to be implemented in pip, and we should instead explore ways for the user to hot-patch a package and make pip accept that patched artifact instead. For build dependencies, this means providing a tool to easily extract, fix pyproject.toml, re-package, and seamlessly tell pip to use that new sdist. pip likely still needs to provide some mechanism to enable the last “seamlessly tell pip” part, but the rest of the workflow does not belong in pip IMO, but a separate tool. (It would be a pip plugin if pip has a plugin architecture, but it does not.)
rgommers commented 2 years ago

And here’s where the fun part begins. By changing the build dependency, pip will need to rebuild that source distribution, and since there’s no guarantee the rebuild will have the same metadata as the previous build, the resolver must treats the two builds as different candidates.

I'm not sure I agree with that. Yes, it's technically true that things could now break - but it's a corner case related to the ABI problem, and in general

A few thoughts I've had on this recently:

uranusjr commented 2 years ago

I agree it should mostly work without the rebuilding part, but things already mostly work right now, so there is only value to doing anything for the use case if we can go beyond mostly and make things fully work. If a solution can’t cover that last mile, we should not persue it in the first place because it wouldn’t really improve the situation meaningfully.

I listed later in the previous comment the two scenarios people generally want to override metadata. The former case is what “mostly works” right now, and IMO we should either not do anything about it (because what we already have is good enough), or persue the fix to its logical destination and fix the problem entirely (which requires the resolver implementation I mentioned).

The latter scenario is what we don’t currently have a solution that even only “mostly” works, unlike the former, so there’s something to be done, but I’m also arguing that something should not be directly built into pip entirely.

pfmoore commented 2 years ago

Looking at this issue and the similar one reported in #10731, are we looking at this from the wrong angle?

Fundamentally, the issue we have is that we don't really support the possibility of two wheels, with identical platform tags, for the same project and version of that project, having different dependency metadata. It's not explicitly covered in the standards, but there are a lot of assumptions made that wheels are uniquely identified by name, version and platform tag (or more explicitly, by the wheel filename).

Having scipy wheels depend on a specific numpy version that's determined at build time, violates this assumption, and there's going to be a lot of things that break as a result (the pip cache has already been mentioned, as has portability of the generated wheels, but I'm sure there will be others). I gather there's an oldest-supported-numpy package these days, which I assume encodes "the right version of numpy to build against". That seems to me to be a useful workaround for this issue, but the root cause here is that Python metadata really only captures a subset of the stuff that packages can depend on (manylinux hit this in a different context). IMO, allowing users to override build requirements will provide another workaround[^1] in this context, but it won't fix the real problem (and honestly, expecting the end user to know how to specify the right overrides is probably optimistic).

If we want to properly address this issue, we probably need an extension to the metadata standards. And that's going to be a pretty big, complicated discussion (general dependency management for binaries is way beyond the current scope of Python packaging).

Sorry, no answers here, just more questions 🙁

[^1]: Disabling build isolation is another one, with its own set of problems.

pradyunsg commented 2 years ago

I think being able to provide users with a way to say "I want all my builds to happen with setuptools == 56.0.1" is worthwhile; even if we don't end up tackling the binary compatibility story. That's useful for bug-for-bug compatibility, ensuring that you have deterministic builds and more.


I think the "fix" for the binary compatibility problem is complete rethink of how we handle binary compatibility (which is a lot of deeply technical work) which needs to pass through our standardisation process (which is a mix of technical and social work). And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

If there is interest and we think the value is sufficient, I'm afraid I'm still not quite sure how tractable the problem even is and where we'd want to draw the line of what we want to bother with.

I'm sure @rgommers, @njs, @tgamblin and many other folks will have thoughts on this as well. They're a lot more familiar with this stuff than I am.

As for the pip caching issue, I wonder if there's some sort of cache busting that can be done with build tags in the wheel filename (generated by the package). It won't work for PyPI wheels, but it should be feasible to encode build-related information in the build tag, for the packages that people build themselves locally. This might even be the right mechanism to try using existing semantics of toward solving some of the issues.

Regardless, I do think that's related but somewhat independent of this issue.

pradyunsg commented 2 years ago

To be clear, build tags are a thing in the existing wheel file format: https://www.python.org/dev/peps/pep-0427/#file-name-convention

rgommers commented 2 years ago

@pfmoore those are valid questions/observations I think - and a lot broader than just this build reqs issue. We'd love to have metadata that's understood for SIMD extensions, GPU support, etc. - encoding everything in filenames only is very limiting.

(and honestly, expecting the end user to know how to specify the right overrides is probably optimistic).

This is true, but it's also true for runtime dependencies - most users won't know how that works or if/when to override them. I see no real reason to treat build and runtime dependencies in such an asymmetric way as is done now.

If we want to properly address this issue, we probably need an extension to the metadata standards. And that's going to be a pretty big, complicated discussion (general dependency management for binaries is way beyond the current scope of Python packaging).

Agreed. It's not about dependency management of binaries though. There are, I think, 3 main functions of PyPI:

  1. Be the authoritative index of Python packages, flow of open source code from authors to redistributors (Linux distros, Homebrew, conda-forge, etc.)
  2. Let end users install binaries (wheels)
  3. Let end users install from source (sdist's)

This mix of binaries and from-source builds is the problem, and in particular - also for this issue - (3) is what causes most problems. It's naive that we expect that from-source builds of packages with complicated dependencies will work for end users. This is obviously never going to work reliably when builds are complex and have non-Python dependencies. An extension of metadata alone is definitely not enough to solve this problem. And I can't think of anything that will really solve it, because even much more advanced "package manager + associated package repos" where complete metadata is enforced don't do both binary and from-source installs in a mixed fashion.

And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

I have an interest, and some budget, for thoroughly documenting all the key problems that we see for scientific & data-science/ML/AI packages in the first half of next year. In order to be at least on the same page about what the problems are, and can discuss which ones may be solvable and which ones are going to be out of scope.

Regardless, I do think that's related but somewhat independent of this issue.

agreed

pfmoore commented 2 years ago

I agree that being able to override build dependencies is worthwhile, I just don't think it'll necessarily address all of the problems in this space (e.g., I expect we'll still get a certain level of support questions from people about this, and "you can override the build dependencies" won't be seen as an ideal solution - see https://github.com/pypa/pip/issues/10731#issuecomment-995544692 for an example of the sort of reaction I mean).

To be clear, build tags are a thing in the existing wheel file format

Hmm, yes, we might be able to use them somehow. Good thought.

And I'm not sure there's either appetite or interest in doing all of that right now. Or if it would justify the churn budget costs.

I think it's a significant issue for some of our users, who would consider it justified. The problem for the pip project is how we spend our limited resources - even if the packaging community[^1] develops such a standard, should pip spend time implementing it, or should we work on something like lockfiles, or should we focus on critically-needed UI/UX rationalisation and improvement - or something else entirely?

I see no real reason to treat build and runtime dependencies in such an asymmetric way as is done now.

Agreed. This is something I alluded to in my comment above about "UI/UX rationalisation". I think that pip really needs to take a breather from implementing new functionality at this point, and tidy up the UI. And one of the things I'd include in that would be looking at how we do or don't share options between the install process and the isolated build environment setup. Sharing requirement overrides between build and install might just naturally fall out of something like that.

But 🤷, any of this needs someone who can put in the work, and that's the key bottleneck at the moment.

[^1]: And the same problem applies for the packaging community, in that we only have a certain amount of bandwidth for the PEP process, and we don't have a process for judging how universal the benefit of a given PEP is. Maybe that's something the packaging manager would cover, but there's been little sign of interaction with the PyPA from them yet, so it's hard to be sure.

pradyunsg commented 2 years ago

/cc @s-mm since her ongoing work has been brought up in this thread!

tgamblin commented 2 years ago

@rgommers:

We'd love to have metadata that's understood for SIMD extensions, GPU support, etc.

I think this is relevant as we (well, mostly @alalazo and @becker33) wrote a library and factored it out of Spack -- initially for CPU micro-architectures (and their features/extensions), but we're hoping GPU ISA's (compute capabilities, whatever) can also be encoded.

The library is archspec. You can already pip install it. It does a few things that might be interesting for package management and binary distribution. It's basically designed for labeling binaries with uarch ISA information and deciding whether you can build or run that binary. Specifically it:

  1. Defines a compatibility graph and names for CPU microarchitectures (defined in microarchitectures.json)
  2. It'll detect the host microarchitecture (on macOS and Linux so far)
  3. You can ask things like "is a zen2 binary compatible with cascadelake?", or "will an x86_64_v4 binary run on haswell?" (we support generic x86_64 levels, which are also very helpful for binary distribution)
  4. You can query microarchitectures for feature support (does the host arch support avx512?)
  5. You can ask, given a compiler version and a microarchitecture, what flags are needed for that compiler to emit that uarch's ISA. For things like generic x86-64 levels we try to emulate that (with complicated flags) for older compilers that do not support those names directly.

We have gotten some vendor contributions to archspec (e.g., from AMD and some others), but if it were adopted by pip,I think we'd get more, so maybe a win-win? It would be awesome to expand the project b/c I think we are trying to solve the same problem, at least in this domain (ISA compatibility).

More here if you want the gory details: archspec paper

tgamblin commented 2 years ago

@pradyunsg:

I think being able to provide users with a way to say "I want all my builds to happen with setuptools == 56.0.1" is worthwhile; even if we don't end up tackling the binary compatibility story.

Happy to talk about how we've implemented "solving around" already-installed stuff and how that might translate to the pip solver. The gist of that is in the PackagingCon talk -- we're working on a paper on that stuff as well and I could send it along when it's a little more done if you think it would help.

I think fixing a particular package version isn't actually all that hard -- I suspect you could implement that feature mostly with what you've got. The place where things get nasty for us are binary compatibility constraints -- at the moment, we model the following on nodes and can enforce requirements between them:

The big thing we are working on right now w.r.t. compatibility is compiler runtime libraries for mixed-compiler (or mixed compiler version) builds (e.g., making sure libstdc++, openmp libraries, etc. are compatible). We don't currently model compilers or their implicit libs as proper dependencies and that's something we're finally getting to. I am a little embarrassed that I gave this talk on compiler dependencies in 2018 and it took a whole new solver and too many years to handle it.

The other thing we are trying to model is actual symbols in binaries -- we have a research project on the side right now to look at verifying the compatibility of entry/exit calls and types between libraries (ala libabigail or other binary analysis tools). We want to integrate that kind of checking into the solve. I consider this part pretty far off at least in production settings, but it might help to inform discussions on binary metadata for pip.

Anyway, yes we've thought about a lot of aspects of binary compatibility, versioning, and what's needed as far as metadata quite a bit. Happy to talk about how we could work together/help/etc.

rgommers commented 2 years ago

The library is archspec. You can already pip install it. ... More here if you want the gory details: archspec paper

Thanks @tgamblin. I finally read the whole paper - looks like amazing work. I'll take any questions/ideas elsewhere to not derail this issue; it certainly seems interesting for us though, and I would like to explore if/how we can make use of it for binaries of NumPy et al.

webknjaz commented 11 months ago

After pyproject.toml: If scipy uses requires = ["numpy"], then you get a forced upgrade of numpy and all the other issues described above, but it does work. Not so great

FTR one workaround that hasn't been mentioned in the thread is supplying a constraints file set via the PIP_CONSTRAINT environment variable. This does work for pinning the build deps and is probably the only way to influence the build env for the end user, as of today.