pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.71k stars 3.07k forks source link

Determining cross-platform resolution strategy #13111

Open ofek opened 3 months ago

ofek commented 3 months ago

Description

I was attempting to fix this issue but after going deep in the code base it appears that pip (and packaging, at least currently) are fundamentally incapable of cross-platform resolution and any path forward would require a determination from maintainers as to the desired course of action.

As a basic example, let's say a user running on Windows wishes to output the wheels required to install a set of direct dependencies on Ubuntu 14.04. To accurately determine if a wheel encountered during resolution is supported one must know at least the entire set of marker values and the allowed platform tags. There are two ways of doing this:

  1. Use a lock file and assume pip is also the installer. - This entails saving the entire index resolution in a file, and coming up with a format for said file. At the point of installation pip would determine the allowed platform tags based on data taken from the system. This is the approach taken by basically every other tool like UV and Poetry.
  2. Require user-supplied resolution constraints. - This means coming up with a bespoke file format for storing an environment marker mapping, platform tag array, and potentially other data that is required. This is similar to the approach of Pex.

The expressed desired path in the linked issue was 2, but I don't think that is a good idea for a few reasons:

Despite that approach being what I perceive to be the maintainers' preference, I think almost no one would use that in practice and it would be a wasted effort, especially when a lock approach is possible and would actually allow for the expected UX of --platform=... in a more predictable manner.

I'm curious to hear the thoughts of maintainers and whether they think that a better path would be to wait for Brett's lock file proposal.

Describe the solution you'd like

N/A

Alternative Solutions

Officially assert that cross-platform resolution will be unsupported

Additional context

Random notes:

Code of Conduct

jsirois commented 3 months ago

@ofek as a point of reference, Pex uses off-the-shelf Pip with minimal runtime patches to achieve this; so although it's true Pip doesn't support cross-platform resolution today, it's not far off at all either. Here are the patches:

jsirois commented 3 months ago

And @ofek I think you misunderstand your approach 2. Pex both supports 3 target types (really 4) - which is what your point 2 addresses:

  1. LocalInterpreter
  2. AbbreviatedPlatform
  3. CompletePlatform
  4. Universal

And you can lock using any of these. In fact you can multi-lock using any combo of the 1st 3 or else produce a single universal lock. They are orthogonal concepts. The target is the target of a resolve or a lock, etc.

pfmoore commented 3 months ago

it appears that pip (and packaging, at least currently) are fundamentally incapable of cross-platform resolution and any path forward would require a determination from maintainers as to the desired course of action.

I think it would be fair to say that cross-platform installs were never a core goal for pip, and the --platform etc., flags were an incomplete attempt to add something without thinking through the implications. For example, pip has no way of reliably installing from a sdist that includes native extensions for a different platform - not least because build backends haven't come up with a standard way of supporting cross-compilations, so there's nothing for pip to work with.

Improving what pip can do would be worthwhile, but as was pointed out in https://github.com/pypa/pip/issues/11664, this will likely involve either getting the user to specify more of the target environment's features (not ideal, it's messy enough already) or using some form of "user friendly specification to environment description" translation. Such a translation should be available across tools, so it should either go into packaging (if they were willing to accept it), or a 3rd party library we can vendor (if someone wanted to write it) or be defined as a standard. What I don't think we should do is try to put such platform determination logic into pip, as then other tools will have to reimplement it[^1] and we'll end up with discrepancies between tools.

If there ever is support for lock files then there would be essentially two ways to achieve the same outcome, with the lock file approach being superior in terms of reproducibility.

Agreed. But in my view, that says that pip should stop trying to do cross-platform resolves, and leave that to other tools that can tackle the various issues and generate a standards-conforming lockfile which pip can then install from. This is the ideal form of interoperability standards enabling specialised tools doing what they are best at.

There's still a UI issue here, though, as we can't avoid the problem of needing the user to specify the target platform in sufficient detail to do the resolution - it makes little difference whether the necessary marker evaluation is done by a simplified lockfile-install process, or by the full resolver.

[^1]: Remember, pip isn't a library and has no API.

Use a lock file and assume pip is also the installer.

I'm not sure what you mean by this. For a start, it means we'd be waiting for lockfiles to be standardised[^2], and the way that discussion is going, it's likely that cross-platform (or "multi scenario" in the terms being used in that discussion) lockfiles won't be part of that standard. Furthermore, I don't see pip as being a "locker" in terms of that standard - we'll install lockfiles created by other tools, but we won't create them ourselves (with one exception, see below). So all this does is push the problem onto other tools - which is fine by me, I guess, but doesn't seem like it's solving anything.

The only form of "saved resolution" pip could (or should, IMO) support is in terms of recording the result of a pip install run[^3]. And that doesn't help with this problem, as we're talking here about how to improve what pip install does, so recording that after the fact isn't helpful. I suspect what you have in mind is some way of recording a partial resolution, leaving a later installer on the target platform to finalise based on the target environment. But that's exactly what a "multi-scenario locker" does, and it's not how pip's resolver works.

More importantly, users supplying such constraint information would have a poor experience because if we are to prioritize correctness (as people expect from pip) then we couldn't take the approach of Pex and allow for the best-effort guessing of environment markers.

Once again, I agree. The user experience is the key problem here. And it's hard, there's no doubt about that. But any means of cross-platform install will require marker evaluation (if you haven't already, go and read the lockfile PEP discussion for all the painful details on this!) so finding a good UI for letting the user define a target platform is going to be necessary however we want to tackle this. Pip's current approach with --platform, --implementation and --abi flags is flawed, possibly fatally, as it essentially requires the sort of "best-effort guessing" you want to avoid.

So we need something new, and as I've already said, that "something" should probably be in a separate library. That library could offer tools to serialise an environment's definition, store environment definitions with user defined aliases ("production-server", for example), guess a specification based on flags (guessing isn't bad as long as the user knows it's happening...), etc. And it could offer an API for clients to retrieve specs via a standard interface.

Or something else. I'm bad at UI design, so take the above with that in mind. But the core point, that this should be a public library, not a private function within pip, is the key.

I'm curious to hear the thoughts of maintainers and whether they think that a better path would be to wait for Brett's lock file proposal.

I don't think this problem (cross-platform resolution) is urgent, so I'm happy in general with "wait and see where the ecosystem is going". I'm not convinced that lockfiles will solve this issue - unless things take a surprising turn and we find a way to agree on multi-scenario lockfiles, the only gain we'll get from a standard lockfile is that users can (for example) use PDM's (tool-specific) cross-platform locking and then export a standard lockfile (again within PDM) for the target environment that pip could install using pip install --lockfile=pyproject.lock --target=./lib.

I think the biggest issue is working out a UI for specifying an interpreter/platform. That is work that will have to be done regardless of what tool the user prefers when doing the cross-platform resolve. So if you have funded time to work on cross-platform issues, I think it would be best spent developing a library that addresses that side of the problem.

[^2]: There's no way I want pip to get into the current "invent your own tool-specific lockfile" game 🙁 [^3]: This is what I was referring to as the one exception to the statement "pip is not a locker" - the report from pip install --dry-run --report should contain sufficient information to generate a single-scenario lockfile, and a 3rd party tool could produce a lockfile from that. Or pip could grow a --report-format=lockfile option. But this inherits all of the problems of the cross-platform behaviour of pip install, so it's not a solution to those problems, just an example of them.