pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.46k stars 3k forks source link

Speculative: --only-binary by default? #9140

Open pfmoore opened 3 years ago

pfmoore commented 3 years ago

What's the problem this feature will solve? A lot of users are reporting issues when there's no Python 3.9 binary for projects they need, and pip tries to build from source and fails with an obscure error (because the user doesn't have a compiler, or isn't set up to build the relevant packages).

Describe the solution you'd like Pip shouldn't try to build from source if the user isn't prepared to deal with build errors. As it's not possible to know the user's level of expertise, we should err on the side of caution, and by default only allow wheels to be installed. Users who know they need to install from source and have checked that they can do so, can explicitly say so using a new --allow-source flag, which acts as an "opt-in" to source builds.

Alternative Solutions Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.

Additional context I don't realistically think this can be added without a lot of disruption, but given that significant numbers of projects ship wheels these days, maybe it isn't as unthinkable as it once was. I do think it's worth discussing the implications, if only as a thought experiment, and I don't know where else we could do that apart from here.

One big problem area is that we can't distinguish between "pure Python" projects that are shipped only as sdists, but which only need Python to build, and complex projects that need a compiler. So restricting to wheels only would require an explicit opt-in for some projects which currently install with no issue.

dstufft commented 3 years ago

We could attempt to make the default more intelligent (or maybe just more magical). Basically have the implicit default be that if a wheel is found at all for some project, that project defaults to only allowing wheels.

Sent from my iPhone

On Nov 16, 2020, at 12:40 PM, Paul Moore notifications@github.com wrote:

 What's the problem this feature will solve? A lot of users are reporting issues when there's no Python 3.9 binary for projects they need, and pip tries to build from source and fails with an obscure error (because the user doesn't have a compiler, or isn't set up to build the relevant packages).

Describe the solution you'd like Pip shouldn't try to build from source if the user isn't prepared to deal with build errors. As it's not possible to know the user's level of expertise, we should err on the side of caution, and by default only allow wheels to be installed. Users who know they need to install from source and have checked that they can do so, can explicitly say so using a new --allow-source flag, which acts as an "opt-in" to source builds.

Alternative Solutions Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.

Additional context I don't realistically think this can be added without a lot of disruption, but given that significant numbers of projects ship wheels these days, maybe it isn't as unthinkable as it once was. I do think it's worth discussing the implications, if only as a thought experiment, and I don't know where else we could do that apart from here.

One big problem area is that we can't distinguish between "pure Python" projects that are shipped only as sdists, but which only need Python to build, and complex projects that need a compiler. So restricting to wheels only would require an explicit opt-in for some projects which currently install with no issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

uranusjr commented 3 years ago

But we can’t know what “all projects” means before deciding whether to set the flag, since dependency information is inside the sdist/wheel 🙃

pfmoore commented 3 years ago

@uranusjr I'm suggesting making --only-binary :all: the default, which doesn't need to know dependency information...

uranusjr commented 3 years ago

Oops, my previous response was toward @dstufft’s “intelligent” suggestion. Sorry for the confusion.

To express my thoughts in more words, I think the “only wheel unless some project needs to compile from source” would be very difficult to implement since the two parts in the logic depend on each other. I would much prefer @pfmoore’s original suggestion of having --only-binary :all: unless the user explicitly allows source distributions.

dstufft commented 3 years ago

The logic isn’t hard and has nothing to do with dependency information.

Current logic is roughly:

  1. Fetch a list of links from the index for project X.
  2. Filter said list of links using the value of —only-binary (among other things like platform tag).
  3. Return list of links for use in the dep solver.

The proposed change only slightly changes the logic in step 2 slightly, such that unless the user has explicitly configured only-binary, we will set the value of it implicitly by inspecting the entire list of links we’ve discovered for project X, and determining if there is a wheel file or not.

This is simple, and would prevent the breakage that Paul is currently seeing, projects which generally make wheels available, but that haven’t for this version of Python / OS / Whatever.

It wouldn’t change anything for projects which don’t ship wheels at all, some of which will be pure Python, some of which will be compiled code, but in any case there’s no “upgrade to Python 3.9 and suddenly start compiling code” problem for these projects since they are consistent in what they require.

The biggest issue with this that I see is in the effort of being smarter about our default to not break certain kinds of projects, we make it easier for projects to accidentally break their users. If my project historically did not upload wheels, and then I start uploading wheels with version 3.1, all previous versions suddenly stop working without opting in to some flag. This is done without any obvious change by the user (upgrading versions of pip is an obvious change, but some thing I install starting to upload wheels is not).

We could work around that problem by trying to reduce the blast radius of the implicit “wheels only” setting, by saying that we will only filter out non wheel links by default that are of the same version of a wheel we’ve found. Thus if we find an sdist for 1.0, 2.0, 3.0, and 3.1 and we find a wheel for 3.1, when we filter the list of links, we will filter it so it has the sdists for 1.0, 2.0, and 3.0 and the wheel for 3.1.

This makes it so that as soon as you upload a wheel for a given version, you’re effectively signaling that not only should a wheel version be preferable, but that the sdist should only be used if explicitly configured to by the user.

Sent from my iPhone

On Nov 17, 2020, at 4:48 AM, Tzu-ping Chung notifications@github.com wrote:

 Oops, my previous response was toward @dstufft’s “intelligent” suggestion. Sorry for the confusion.

To express my thoughts in more words, I think the “only wheel unless some project needs to compile from source” would be very difficult to implement since the two parts in the logic depend on each other. I would much prefer @pfmoore’s original suggestion of having --only-binary :all: unless the user explicitly allows source distributions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

pfmoore commented 3 years ago

Maybe we simply make --prefer-binary the default (rather than --only-binary)? I didn't suggest that originally because it means that we trigger "why don't I get the latest version?" questions. But maybe that's a less serious breakage?

pradyunsg commented 3 years ago

This makes it so that as soon as you upload a wheel for a given version, you’re effectively signaling that not only should a wheel version be preferable, but that the sdist should only be used if explicitly configured to by the user.

I like it? I think something like 98% of packages on PyPI have wheels in the latest release, so I don't think this is catastrophically bad.

Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.

IMO one of the improvements we should make here is adding a sentence like: "This failure occurred while trying to generate [a wheel / metadata] for packageName. This is not an error in pip."

This also applies to the proposed approach here too -- clearer error messaging would be good. :)

uranusjr commented 3 years ago

I like it? I think something like 98% of packages on PyPI have wheels in the latest release, so I don't think this is catastrophically bad.

I suspect the number would be significantly lower if you count percentage of downloads instead. There are a bunch of popular pure-Python projects that don’t bother with wheels because the effect is minimal. django-grappelli is one of my favourite examples: it’s popular, well-maintained, regularly released, and has very spotty wheel support. --prefer-binary by default would break a lot of Django setups out there.

pfmoore commented 3 years ago

I think something like 98% of packages on PyPI have wheels in the latest release

I'm pretty sure that's a figure I gave you, and I found the bug in my calculation a bit later 🙁 I need to re-do the sums, but I think it's a lot lower than that, unfortunately.

I suspect the number would be significantly lower if you count percentage of downloads instead.

The number's a lot lower without doing the sums incorrectly 🙂 Sorry about that. I don't have download information, but I'm re-doing the numbers right now, and I'll see what things look like if you factor in "uploaded a file in the last 12 months" as well.

I might try getting download numbers from the BigQuery data for offline analysis. Downloads per project, per year (month?) might be sufficiently interesting, if I can work out how to get that relatively easily in a CSV format or similar.

To confirm, my query has just completed. Comparing "number of projects that distribute sdists but no wheels for their latest version", vs "number of projects that distribute wheels for their latest version", the numbers are almost identical (124508 vs 124782). Looking at projects which have released at least one file in the last year, the values are 32890 and 66635.

So half of all projects, 2/3 of projects active in the last year, have wheels.

As I say, I think that however we did this, it would result in a lot of breakage.

dstufft commented 3 years ago

It's a backwards incompatible change, so regardless it's going to break someone. The goal behind my proposal is to limit the blast radius, so that we limit the breakage, either to specific projects, or to specific versions within a project.

I think there's two questions here too:

I'm not sure about the long term "right" answer. I can see an argument that we want to encourage wheels where possible.. but I also think that there are some projects that simply cannot be shipped as wheels, and maybe will never be able to be shipped as wheels. We need to figure out if going wheel only by default will end up being worth it, or if we will push too many projects out of viability.

For the second one, I think having the default by to filter out sdists, for any project version that has any wheels uploaded, solves the main driver to this proposal, without breaking projects that are not shipping wheels (or used to ship wheels, but found out that was problematic). That could be useful as a stepping stone for getting to a wheel only default (for instance, we could provide warning when installing from sdist then), or it could be a reasonable end state that solves the surprising accidental sdist install, without dropping support for sdist only projects by default.

pfmoore commented 3 years ago

I think we could do a lot better if we could somehow identify which projects are "hard" to build from source. I feel like blocking sdists that build into universal wheels is going a bit far. In the most general sense, that's basically impossible, but maybe we could add metadata somewhere (in the simple index?) to mark "pure Python" projects?

I agree it's not clear what the best long term answer is. We're seeing a lot more people using Python nowadays who honestly don't want to, or know how to, deal with building stuff from source. For those people, pip downloading a sdist that needs a compiler to build is almost certainly just a source of problems. But they are also precisely the sorts of user who won't know enough to add --prefer-binary. However, optimising for such users is going to impact a big chunk of our "traditional" user base negatively.

dstufft commented 3 years ago

I wonder if we can leverage PyPI in some way to encourage wheels, or to at least surface better information to highlight which projects don't ship wheels? This might be a better question for discourse? I dunno.

pfmoore commented 3 years ago

I've got a big chunk of downloaded data from PyPI that I am querying to get a better feel for this sort of stuff. The biggest problem is the vast amount of (to be polite) "limited value projects" on there - without some form of insight, it's hard to know for sure whether it's OK to ignore a project called "0html" or "django-3-jet-zupit" - especially when it comes up in the same query as "090807040506030201testpip"...

uranusjr commented 3 years ago

What if PyPI automatically builds the simplest pure Python wheels? There’s recent interest to detect malicious source distributions on PyPI, and the wheel it would produce as the side effect should be able to be reused.

mattip commented 3 years ago

Any more thoughts here? I especially like the idea

... having the default by to filter out sdists, for any project version that has any wheels uploaded, solves the main driver to this proposal, without breaking projects that are not shipping wheels

The metadata option also seems reasonable, then the scientific python community could mark NumPy, Scipy, tensorflow, pytorch as "prefer binary by default" and save a lot of CI and cloud resources.

uranusjr commented 3 years ago

I like the idea as well, maybe with a twist: Versions with only sdist are excluded, unless there are no wheels available at all prior to that version.

Use django-grappelli as an example, this means that

  1. Wheels are selected for 2.15.1, 2.14.4, 2.14.3, and 2.14.2.
  2. Sdists between 2.14.1 and 2.11.2 are all ignored since there are older wheels.
  3. Wheels from 2.11.1, 2.10.2, 2.10.1, 2.9.1, and 2.8.3 can be selected. Sdists between 2.11.1 and 2.8.3 are all ignored.
  4. Sdists from 2.8.2 downwards are allowed, since there are no wheels available past that version.
mattip commented 3 years ago

For another data point here is an issue filed by a python3.5 user of cffi where they cannot build with the sdist, and changing the default would have helped them.

mattip commented 3 years ago

Please edit the title binary-only -> only-binary. I always have to check pip --help to figure out the correct spelling.

pradyunsg commented 3 years ago

FWIW, that tells me that we should add an alias for that option.

rgommers commented 3 years ago

+1 for a solution via either package metadata or via a simple rule like "--only-binary :all: is applied if a package has any wheels".

Otherwise it has the risk of becoming a pip-only solution which is hard to understand. Today the problems mostly surface via pip because it's by far the most popular installer, but this is really a PyPI-ecosystem problem where the dual model of offering both source and binary packages and allowing freely mixing those is the root cause.

Sdists from 2.8.2 downwards are allowed, since there are no wheels available past that version.

This does not seem like a good idea. Not only is it harder to understand, it also partially defeats the purpose here. If a package has a very old source-only release (e.g., from the pre-wheels era) then that will be will be found the moment there's no suitable wheel for a user.

In your particular example, django-grappelli 2.8.2 is from 2016; a user who types pip install django-grappelli almost certainly does not want a version that old.

uranusjr commented 3 years ago

Makes sense. I think it's quite difficult to gauge the actual impact here, since people here all care much about Python packaging (for apparent reasons) and likely push for wheels in projects we are involved. So I feel the only way to go forward is to actually try to implement this (maybe as a --use-feature first) and see if we can ~survive it~ make it work in real life usages.

There are probably still some implementation details we need to sort out. Should we go with --prefer-binary or --only-binary by default? How does a user disable this and prefer an sdist with newer version? etc. But I'm going to mark this as "awaiting PR" so anyone can try to come up with something. It's easier to put things into perspective when there is an implementation and test cases ~to object to 😛~.

tacaswell commented 2 years ago

I would propose an alternate path forward. Rather than changing the default behavior of pip to prefer wheels, add a second CLI entry point of pipw (pipb?) which is an alias with the default of --prefer-binary / --only-binary (and maybe rejects any attempt to change source-only installs from pypi and local source installs?). I think adding a 'w' is a much easier mnemonic to remember that the right flag(s).

As has been mentioned above, pip currently mixes two different things (building and installing from source and installing from pre-built binaries) and I think it is a mistake to tilt pip even more in favor of being a binary-only package manager. By adding a new CLI entry point it is possible to make what ever changes are needed to make pip behave like a binary package manager without having to worry about breaking an existing users.

I think another issue here is a disagreement as to what exactly wheels are for. I have always considered (and I may be the only one to hold this position) the sdist the canonical source of truth for what the released version of the package is on pypi with the wheels are provided for the convenience of the user (the linux wheel spec is "manylinux" which suggests it is a best-effort rather than authoritative artifact!). I think making pip more-binary package-manager like by default will only re-enforce the expectation that projects will (promptly) provide a wheel for your platform / Python version / Python implementation and one not existing is a "bug".

There was a discussion on the numpy mailing list about the ever expanding number of platforms that projects are expected provide wheels for becoming un-sustainable (the latest beta-release of Matplotlib has 21 wheels and we are not yet covering the full Python version/Python implementation/arch/OS matrix https://pypi.org/manage/project/matplotlib/release/3.5.0b1/). If pip is going keep going down the path of binary packaging, I think there needs to more discussions about how filling out the build matrix can be lifted from the projects to some centralized build service like the homebrew, conda-forge, and the Linux distributions do already. Separating the wheels into their own channel/management chain would also make it easier to manage things like updating version pinning on the wheels post-facto (e.g. putting an upper bound on something or banning known-bad version combinations), re-building with updated versions of non-Python dependencies (xref https://github.com/h5py/h5py/issues/1942), or dealing with CVEs much easier.

mattip commented 2 years ago
How can we make the abstract discussion here more concrete? I see a couple of subjects being mixed together topic possible mitigation
aliasing only-binary and binary-only PR to implement, should be the least controversial change suggested here
providing a path for naive users to prefer wheels over sdists by making only-binary the default, making prefer-binary the default, or providing a different cli entry point competing PRs to do these would provide a forum for discussion over the name and/or need for this
preferences when using --prefer-binary when sdists are available for newer versions and wheels available for older ones ???
wider ranging changes in the way wheels are built and distributed for the growing Nd matrix of python-versions/implementations/os-versions/machine-architectures/available-hardware ??? - mailing list/discourse?

I apologize if I missed some of the topics here, please feel free to add to the table. The next question is who will do the work ...

uranusjr commented 2 years ago

I’m dropping a link to the RFC proposing to disable install scripts by default for NPM, which would have roughly the same effect as making --only-binary the default (not --prefer-binary). npm/rfcs#488

pfmoore commented 2 years ago

Most of what @mattip says looks right to me.

For the final point, I agree that this needs a wider discussion than just the pip tracker, once we start going beyond the basic "make users opt into building from source" approach. If we want a more complete solution, I'd suggest that interested parties post a proposal on Discourse for new metadata (which would need to be in sdists and exposed on the PyPI simple index, to be usable here) stating at least two things:

  1. Project is pure Python and needs no external tools to build.
  2. Project owners suggest that installers require user opt-in to build from source.

We'd need buy-in from setuptools at an absolute minimum (if setuptools won't write the relevant metadata to the sdist, it's essentially not going to be available to consumers) and that probably means setuptools needs to add support for PEP 643, as that's the only way we have of getting reliable metadata for sdists. Realistically, no build backend other than setuptools is an issue here, because only setuptools supports both "simple" and "insanely complicated" build processes 🙂

If (and honestly, this seems like a big "if" to me 🙁) we can get commitment from the various parties in the community, then that could become a PEP and implementation. But it feels like something that may be too much for the level of volunteer resource we have, so it would probably need funding to get anywhere. As the scientific/data science community has a strong interest in this, maybe there are grants around the sustainability/build infrastructure area that could be used for something like this?

rgommers commented 2 years ago

I’m dropping a link to the RFC proposing to disable install scripts by default for NPM, which would have roughly the same effect as making --only-binary the default (not --prefer-binary)

That's a very long discussion, but from what I could gather it's only motivated by security. While here we're talking about usability, and the issues around building of complex packages which is likely to fail. So while there may be overlap in impact, the tradeoffs are probably very different.

But it feels like something that may be too much for the level of volunteer resource we have, so it would probably need funding to get anywhere. As the scientific/data science community has a strong interest in this, maybe there are grants around the sustainability/build infrastructure area that could be used for something like this?

If it looks like there will be buy in for this idea from the relevant maintainers/parties, I'd be happy to lead the obtain-funding part.

jedie commented 2 years ago

I think i have come across a relevant problem in this context: On a Raspberry Pi you would like to use binaries from https://www.piwheels.org/

But how to include the binaries from there into lock file?

See also:

rgommers commented 2 years ago

That's not relevant to this discussion @jedie. Yours is more a usage question suitable for Stack Overflow. If you do want to discuss a Pip design change, please open a new issue.

pfmoore commented 2 years ago

Let's focus back on the original proposal here.

Do the @pypa/pip-committers as a group want to switch to --only-binary :all: being the default behaviour?

If we do, @rgommers has offered to find funding to make it happen[^1]. But nothing will happen until we reach some sort of consensus. The default is of course to do nothing, but even if that's what people prefer, it would be nice to be explicit, and state clearly that pip considers building from source to be just as fundamental as installing wheels. We could then close this PR and move on.

Details like whether we do --only-binary, --prefer-binary, or @dstufft's hybrid suggestion can be part of the implementation, once we have some level of consensus (and the funding 😉)

For anyone who wants more information @rgommers suggests here that the data science community would benefit from --only-binary :all: being the default.

FWIW, I'm +1 on making this change.

[^1]: Questions like "how do we transition", "what would happen to all those users who use pip to build their applications from source", "how do we handle all the hate mail from people affected", would be part of the funded work, so for now we can assume that's "someone else's problem", and concentrate on whether we support the principle.

pfmoore commented 2 years ago

PS One interesting piece of data on the whole issue of sdist building, would be to trawl through the tracker here and identify what proportion of our issues are related to building sdists. I bet it's high. Unfortunately, I don't have the time to do this...

Maybe a label marking such issues would be useful?

pradyunsg commented 2 years ago

I'm on board for this, assuming that we find folks with the right skillset to handle the coordination and communication work for the transition.

This has a very significant amount of churn for certain kinds of users and I imagine certain packages will see breakage as well (because they don't published wheels). I believe that we will need to have similar levels of communication, change management and user feedback loops as the resolver transition (maybe more!). I think there is relatively minimal implementation work needed here compared to the communication work, but I'm OK with being wrong on this. :)

pradyunsg commented 2 years ago

Maybe a label marking such issues would be useful?

We have C: build logic, which might be what you're looking for?

tacaswell commented 2 years ago

it would be nice to be explicit, and state clearly that pip considers building from source to be just as fundamental as installing wheels.

I am a bit shocked by this statement and how far apart my understanding as a user is from the maintainers!

As a user (and project maintainer) I have always understood pip to be for source installations / sdists first and wheels as a secondary "nice to have" convenience for users (saving time and the need to have compilers + system dependencies setup). If I want binary-only builds I would reach for one of the ecosystems of binary artifacts (conda, macports, homebrew, any linux package manager, winpython, canopy, pythonxy, ...).


If pip does go this route I think what you need the resources for is to build a centralized build system to build wheels for the matrix of OS/platform/Python versions in a systematic way, salary for people to keep an eye on it, and to boot-strap a community of volunteers for the care-and-feeding of the build system (basically build a parallel version of conda-forge).

uranusjr commented 2 years ago

I am supportive of either --only-binary or --prefer-binary being a default, but we need to first add a flag to not prefer/exclude sdist for a particular package.

tacaswell commented 2 years ago

I had a chat with @rgommers and have hidden my previous comment (deleting it seemed too much, but no one should read it ;) ). In the status quo pypi+pip mixes a couple of functions that are each hard on their own and nearly impossible to get right simultaneously. That in turn is creating a bunch of maintenance pain in downstream packages. Anything that makes progress towards separating out these functions is a good step forward in the long term.

I am concerned that this will re-enforce the expectation that projects with c-extensions to provide wheels for a (ever-growing) matrix of OS x Python version x Python implementation x architecture. However, that expectation (for better or worse) already exists, and this is going to reduce the maintenance burden in other ways so it is a net win.

uranusjr commented 2 years ago

I can’t really speak for this, but it is my understanding that helping project build wheels is indeed one of the long term Python packaging goals (not by pip maintainers but the community as a whole). For the current time, however, tools like cibuildwheel should make wheel-building easy enough for the majority of cases when combined with modern public CI infrastructure.

jezdez commented 2 years ago

Let's focus back on the original proposal here.

Do the @pypa/pip-committers as a group want to switch to --only-binary :all: being the default behaviour?

I'm strongly in favor of this (+1) and would be happy to help with finding funding for this effort.

If we do, @rgommers has offered to find funding to make it happen1. But nothing will happen until we reach some sort of consensus. The default is of course to do nothing, but even if that's what people prefer, it would be nice to be explicit, and state clearly that pip considers building from source to be just as fundamental as installing wheels. We could then close this PR and move on.

Details like whether we do --only-binary, --prefer-binary, or @dstufft's hybrid suggestion can be part of the implementation, once we have some level of consensus (and the funding 😉)

For anyone who wants more information @rgommers suggests here that the data science community would benefit from --only-binary :all: being the default.

Thanks for referencing this, @rgommers: I'd be happy to collaborate on this given the overlap of goals with the conda project, where we want to improve the support for wheel files, too. I'd suggest to form an ad-hoc working group to explore how we can move this forward before finding the funding given the complexity of the topic.

FWIW, I'm +1 on making this change.

Footnotes

  1. Questions like "how do we transition", "what would happen to all those users who use pip to build their applications from source", "how do we handle all the hate mail from people affected", would be part of the funded work, so for now we can assume that's "someone else's problem", and concentrate on whether we support the principle.

@pfmoore I think it would make sense to loop-in Shamika Mohanan at the PSF to help with project and community management for this part.

pfmoore commented 2 years ago

@pfmoore I think it would make sense to loop-in Shamika Mohanan at the PSF to help with project and community management for this part.

+1. I should note that I don't personally intend to drive this effort (even though I raised the original issue). I'm happy to do some of the technical work around implementation[^1], but I won't have the bandwidth for the transition/outreach work involved.

Also, I think we should take note of @tacaswell's comments, and use this as an opportunity to much more explicitly state our view on what pip's is within the packaging ecosystem (which, in my view, is that it's an installation tool, which has over the years been pressed into service as a development workflow tool, mostly due to the lack of anyone willing to work on a dedicated tool to fit that niche). A lot of the pushback here will likely be due to people having conflicting views on what pip is for, and while being explicit won't alter the fact that we're making some people's workflows more difficult, it will at least help explain why.

[^1]: Which, to be honest, is relatively straightforward.

pradyunsg commented 2 years ago

Looping in @s-mm, since there's a bunch of discussion about fundraising, community management and project management.

jezdez commented 2 years ago

Also, I think we should take note of @tacaswell's comments, and use this as an opportunity to much more explicitly state our view on what pip's is within the packaging ecosystem (which, in my view, is that it's an installation tool, which has over the years been pressed into service as a development workflow tool, mostly due to the lack of anyone willing to work on a dedicated tool to fit that niche).

To be fair, pip started as a development workflow tool that catered to a particular type of workflow (VCS backends anyone? 😬) and has (at least from my PoV) gotten just better and more focused on those particular use cases and real-world needs. I think that also entrenched the workflow parts that every once in a while get in the way to evolving the ecosystem. I appreciate the recent efforts to implement smaller, dedicated packages and the continuous PEP efforts because of that.

A lot of the pushback here will likely be due to people having conflicting views on what pip is for, and while being explicit won't alter the fact that we're making some people's workflows more difficult, it will at least help explain why.

Yeah, I hear you and would prefer (in the spirit of catering to a growing Python community) to make building wheels a trivial step in the release process, so that the build functionality in pip can be eventually be retired (while providing a fallback like pypa/build).

Footnotes

  1. Which, to be honest, is relatively straightforward.

For pip that's true, the majority of work seems to be on a potential build platform side.

rgommers commented 2 years ago

Thanks for your thoughts and offer to help @jezdez!

For pip that's true, the majority of work seems to be on a potential build platform side.

I'd like to point out that while it would be awesome to have such a platform, and I'm up for collaborating on that idea, it's not a prerequisite for defaulting to --only-binary and is a separate (if related) topic. We have wheels today, and the main purpose of switching Pip's default behavior here is to prevent unsuspecting users from trying to build from source when they didn't intend that. This has two benefits:

  1. most importantly, it will remove a significant source of confusion / bug reports / maintenance load because of bad builds (actual build failures, or issues later at runtime).
  2. it will allow projects that now do not upload sdists at all - in order to avoid the problems in (1) - to actually upload their sdists safely.
rgommers commented 2 years ago

Do the @pypa/pip-committers as a group want to switch to --only-binary :all: being the default behaviour?

It sounds like we're good here, with 3 of the most active/senior Pip maintainers (@pfmoore, @pradyunsg, @uranusjr) in favor and no dissenters. Please correct me if I'm wrong @pfmoore, and you're still waiting for replies.

I'm on board for this, assuming that we find folks with the right skillset to handle the coordination and communication work for the transition.

Great point, agreed. I'm a little more optimistic about the level of impact/breakage than you are, but definitely it needs a lot of planning and communication.

Given that we now should talk about who is/are the right people to do the heavy lifting and how much funding is needed, I propose to take this offline to sort that out. I'll send out an email soon to: @pfmoore, @pradyunsg, @uranusjr, @s-mm, @mattip and @jezdez. Please let me know if I missed anyone.

pfmoore commented 2 years ago

You should probably also include @sbidoul (and maybe @dstufft although he's not as active on pip these days).

sbidoul commented 2 years ago

Alternative Solutions Improve the error messages when a source build fails. This is hard, because the details of what went wrong are entirely the responsibility of the build backend.

Is it that hard ? For instance, pip could

Of course it is not entirely trivial to implement (especially determining that "wheels are otherwise available"), and users may still not read the message. But since we are talking about funding to improve this, that approach might be feasible.

So to be clear I am not opposed to changing the default behaviour, but I wanted to push a little bit in the other direction to be sure we weight that properly as a possible solution, compared to using more of our churn budget with a breaking change.

rgommers commented 2 years ago

Is it that hard ? For instance, pip could

It's not that hard I'd expect, but it also doesn't solve much. The same from source builds will still fail, the only thing a better error message fixes is that the user is then more likely to open an issue on the correct repo rather than on the Pip repo. That's nice of course (and perhaps worth doing anyway), but it's a minor gain compared to what we're actually trying to do here: stop unsuspecting users from getting those build failures in the first place. Which has many benefits, like reducing maintenance load for many projects, allow projects to drop old manylinux versions (see https://github.com/pypa/warehouse/issues/9640), and allow projects that now avoid uploading sdists to start uploading them.

pradyunsg commented 2 years ago

FWIW, improving precisely those errors is what I've been working toward in #10421.

pradyunsg commented 2 years ago

FWIW, I just realised that you can try how this would behave today, by setting PIP_ONLY_BINARY=:all: or pip config set global.only-binary :all:.

rgommers commented 2 years ago

Thanks @pradyunsg, that's good to know. I'll flip this switch on all my machines, and will see if anything falls over. Given that it's recommended for pure Python projects to upload a wheel anyway; I'd expect that if anything fails that's a good opportunity to go file a request to do just that:)

pradyunsg commented 2 years ago

I know that pip's vendoring process failed when it is run with --only-binary :all: because at least one of the packages we vendor does not upload wheels.

takluyver commented 2 years ago

@tacaswell pointed me to this - we co-maintain h5py, a package which includes compiled extension modules.

I'm in two minds about it. It would certainly help users to understand the situation if they got a message saying "there are no pre-built packages of h5py for your platform - you could try installing from source (by doing...), or get it from elsewhere (conda, brew, apt...)". But I share @tacaswell's concern that demoting sdists from a default install option will reinforce the expectation that maintainers should be providing wheels for a wide range of platforms, which is already a challenge.

@uranusjr mentioned cibuildwheel. We're already using this, and it certainly makes it easier to build wheels for different platforms. But there is still a fair bit of effort involved:

We currently make wheels for x64 on (many)Linux, Mac, Windows, plus aarch64 (64-bit ARM) on Linux. People definitely want the ARM Macs added to that. Then I'm also aware of people using h5py on at least one other architecture (ppc64le) on Linux, Linux with a different C library (musl) which could well mean separate builds for x64 & aarch64, plus a separate Python implementation (PyPy), which could involve anything up to duplicating all of the different build platforms we have for CPython.

Apologies, I've gone a bit off-topic here. I wanted to give a bit of a sense of what it's like maintaining a package with extension modules, and what it might mean if tools like pip start suggesting that it's wheels or nothing. I think the impact would depend a lot on how the 'no suitable wheels' error message is worded, and how easy it is to try installing from source after hitting that.