pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.46k stars 3.01k forks source link

Debundling, Technical Debt and Responsibilities #9677

Closed pradyunsg closed 2 years ago

pradyunsg commented 3 years ago

This is an offshoot from #9467. Go ahead and read the most recent few comments for context.

/cc @FFY00 @eli-schwartz @stefanor @doko42 @kitterma @hroncok to put this on your radar. Please feel free to add in other redistributors of pip, who might be interested in this discussion.

/cc @pypa/pip-committers to call me out if I say something wrong here.


First off, I know folks probably don't say this enough, but: hey, thanks for doing what you do. I genuinely do appreciate the work you do, and it's quite a thankless task to keep OSS things functioning. So, hey, thank you!


I've seen it stated in multiple places, by multiple different people now, that because pip has a mechanism to simplify debundling, it should work when debundled as is.

Please don't take the fact that the debundling script exists in this repository, to mean that it is somehow pip maintainers' responsibility to ensure that it'll give you something that just works. If that's the expectation being set, I'd like to remove that script from this repo, and make it clearer that it's not something that we want to deal with.

As far as I can tell, the debundling script exists because... how do I phrase this diplomatically... it was very easy for redistributors to debundle pip, get it kinda-sorta working and ship a broken pip. It is there to make a redistributor's life easier when they're debundling pip, but that can not be at the cost of making pip's maintainers' lives harder.

Quoting from our vendoring policy's debundling-script-is-here section:

However, if you insist on doing so, we have a semi-supported method (that we don’t test in our CI) and requires a bit of extra work on your end in order to solve the problems described above.

The "bit of extra work" is making sure that the debundled pip doesn't fail in any of the ways that our users use it. (I'm saying "our users" because these are users of both pip and $thing)

For years now, we've effectively been saying "don't do that please, because it'll break things. And if you do it anyway, please make sure it works OK because we don't have the resources to test all the ways it can break for you.".

And then, redistributors and said "yea, so... it broke because we didn't account for $thing, but if you do this one thing in this specific weird way, our jenga tower stays upright". And we've been accepting (even authoring) such patches because it makes our users' lives easier.


It'd be very easy for pip's maintainers to point to our vendoring policy, and start saying no to patches intended to make things work when pip is debundled (and to revert the ones we've merged over the years already). We're not doing that, but I'd like to get to a point where pip's maintainers don't have to worry about the issues caused by the debundling of pip.

Honestly, pip's maintainers can't be fixing / dealing with these issues - that's literally the point of that policy document. pip getting even more changes and weird type conversions, to accommodate for the various breakages due to debundling is NOT the solution here. It's a fragile workaround, can easily be refactored and makes it more difficult to maintain this code in general. It's technical debt for pip, in exchange for avoiding additional work for downstream redistributors. We've been taking this on for a while but I'd really like to stop doing that.

Pip's1 been taking on technical debt for something that we've explicitly documented that we don't want to be taking technical debt for. We are going to have to start saying no to these patches at some point and I'm sure everyone would prefer it wasn't a thing we started doing on a random day magically. :)

1 I hope you're smiling Paul.


So... I do have a few questions now.

For the redistributors who are debundling pip, could you share with us the line of reasoning that leads you to decide that you have to be debundling pip? Feel free to point to specific sections in bunch of documents. (I promise to not get into the discussion of the merits of those choices, I just wanna know what they are to better understand your situation)

For everyone:

  1. What can be done so that redistributors are not pushing the costs of debundling pip over to pip's maintainers?
  2. Should pip's maintainers start saying no to taking on more technical debt?
  3. Any bright ideas for improving this situation? 🙃

(PS: typed with typo-happy thumbs, on a phone) (PPS: this is definitely a "if I had more time, I'd have written a shorter letter" situation)

hroncok commented 3 years ago

First off, I know folks probably don't say this enough, but: hey, thanks for doing what you do. I genuinely do appreciate the work you do, and it's quite a thankless task to keep OSS things functioning. So, hey, thank you!

:heart:

ccing also @encukou @torsava @frenzymadness @stratakis from our team

eli-schwartz commented 3 years ago

We (Arch) consistently debundle both pip and setuptools, so the referenced issue is not a problem for us. Actually, we have seen that issue and consider it a setuptools issue...

I don't see any rationale for devendoring pip, but not devendoring setuptools... is their packaging policy not consistent? Why is it the pip project's responsibility to add code specifically for the case where debian politics prevents timely coordination across packages?

I would prefer pip to work with distros doing devendoring, when it is the technically correct thing for pip to have the solution. Off the top of my head, I don't particularly remember any problems Arch has had in that regard, other than issues and updates to _vendor/__init__.py itself, so I'm optimistic that we can happily work together on this.

I would prefer if Debian were to solve internal Debian bureaucracy by handling it via Debian bureaucratic channels (e.g. a bug report to the setuptools package to devendor setuptools, which is then a blocker before they can devendor pip), rather than trying to solve it by patching pip for a Debian-specific need that does not result in elegant code, and apparently increasing the maintenance burden of pip. :(

pradyunsg commented 3 years ago

@eli-schwartz the PR referenced is for an issue that occurs only on Arch (AFAICT): https://github.com/pypa/pip/issues/9348

If this were specific to Debian, I'd have reached out to the Debian maintainers directly. :)

eli-schwartz commented 3 years ago

As far as I can tell, that too is because of inconsistent devendoring, which we don't do... But in that case, mixing a distro pip and a user installed setuptools breaks.

This is related to https://github.com/pypa/setuptools/issues/1383 and more generally to the golden rule "do not overwrite/override the distro packages with your own, it will break other distro packages".

But originally I only read the PR, not the issue. :)

pradyunsg commented 3 years ago

For the redistributors who are debundling pip, could you share with us the line of reasoning that leads you to decide that you have to be debundling pip? Feel free to point to specific sections in bunch of documents.

If you could also elaborate on this, that'd be great!

stefanor commented 3 years ago

@pradyunsg: Thanks for starting this discussion.

Please don't take the fact that the debundling script exists in this repository, to mean that it is somehow pip maintainers' responsibility to ensure that it'll give you something that just works.

That's not my expectation. I assume you don't test it. And I see it as Debian's responsibility to help keep this mechanism working, as long as we're taking advantage of it. I am prepared to spend several hours scratching my head and debugging those problems, every now and then.

I expect that there may be push-back on implementation details of those patches, but not to have to re-litigate the existence of the debundling support.

And then, redistributors and said "yea, so... it broke because we didn't account for $thing, but if you do this one thing in this specific weird way, our jenga tower stays upright". And we've been accepting (even authoring) such patches because it makes our users' lives easier.

In the case of #9686, you're not wrong with that description. Finding the cause of the debundling issues that lead to it took a few days (on and off). I see @kitterma was previously aware of that issue years ago, but if I was, I'd forgotten all about it... However, these are fairly minor bugs, once understood. The Jenga tower stands fairly well...

For the redistributors who are debundling pip, could you share with us the line of reasoning that leads you to decide that you have to be debundling pip?

Debian Policy: https://www.debian.org/doc/debian-policy/ch-source.html#embedded-code-copies Debian Wiki Page: https://wiki.debian.org/EmbeddedCopies

What are the rationales behind it? Part if it is the definition of a distribution. We collect together software and try to fashion it into a cohesive system. The fewer copies of each thing that's in it, the easier our job is, and the smaller and simpler the system is. That's our security team's view too. They want to minimize the amount of work to do in response to issues.

If we need to patch something in a library, it's really annoying to have to do it in multiple copies of that library. Especially if different people are responsible for maintaining each one. Hopefully that's not necessary, of course.

When a project is targetting a specific version of a library that is out of date and we want to replace it, we'll (often) help them to port their software to the new version. A big distro is full of many dead & semi-dead upstreams, so there is an endless amount of this work to do. (That can be a good hint that a project has died and it's package should be removed.)

Of course, every policy of course has exceptions. Here's the documented list of known embedded copies in Debian, that's far from complete.

Basically, we debundle, where we can. When we can't, we may disable the relevant features, or just carry the embedded copy (distastefully). For web browsers, for example, in practice we have to carry embedded copies of several dependencies to support our stable releases. That's obviously the right trade-off to make in that situation.

Maybe we should be doing the same thing with pip, now that the Python distribution space is evolving faster. Pip is pretty much a user-facing leaf package (not a library being used by other packages). So, we could, be shipping updated pip to stable Debian releases. And using bundled dependencies does make that very easy. This is certainly a conversation we can have.

Stable distributions are trying to offer stability, by changing as little as possible. That usually means not shipping updated versions of software, because the updates bring unexpected change. When a project has stabilised enough, and has the testing and engineering resources to reliably produce stable versions that work across a range of platforms, and won't cause too many behavioural regressions, then shipping updates becomes feasible.

It's all a matter of weighing risks to our users.

  1. What can be done so that redistributors are not pushing the costs of debundling pip over to pip's maintainers?

I don't know how much space there is there. Distributors can carry the cost of writing patches, but they'll need review. And there may be knock on technical debt. I'm not aware of much of that in pip related to debundling, but maybe you can educate me on that.

Supporting debundled use (where the versions may not be exactly what you expect) encourages you to keep libraries at more arms-length from each other, rather than tight integration. I see that as good library design.

  1. Any bright ideas for improving this situation? upside_down_face

I think the situation is pretty good. pip supports a common use-case for distributor modification directly upstream. We could take that one step further with upstream CI.


@eli-schwartz:

pip has a bit of a "best of breed" approach to vendoring modules which works pretty well, and the debundling process is basically "remove the _vendor/ modules, keep the shim and you're done".

:100:

Why is it the pip project's responsibility to add code specifically for the case where debian politics prevents timely coordination across packages?

From my PoV my PR makes pip more robust when devendored. It's not pip's responsibility to take it, but if pip wants it, I'm offering it.

I like to be able to carry the smallest patch-set possible in Debian, it makes makes our life easier, and means less chance for users to experience something different to what the upstream expects. As distributors we are trying to serve both users and upstreams.

pradyunsg commented 3 years ago

IIRC, we had CI for debundling at some point in the past and we removed it after some discussion. The basic line of reasoning there was (1) our CI setup has wayyyy too many long-ish jobs already (2) that CI can only cover one aspect of the situation and it cannot ensure all the potential combinations of patched/unpatched pip/pip's dependencies work.

(I appreciate the responses here; but I can't respond to those yet because I have to make and eat breakfast)

uranusjr commented 3 years ago

Coming purely from the technical side, I wonder if it’s possible to configure Mypy to check for this. Maybe a conditional import to tell pip._vendor.packaging.version.Version and pkg_resources.Version are not interchangable, even though the latter is an alias at runtime? This would be able to ensure like 90% correctness with pretty neglectable CI addition.

pradyunsg commented 3 years ago

It's not pkg_resources.Version, but pkg_resources.extern.packaging.version.Version (which is removed and converted to pip._vendor.packaging.version.Version by our vendoring logic). Debundling pip and setuptools means that it's now possible that they no longer match.

uranusjr commented 3 years ago

Yes, and I’m thinking maybe it’s possible to “fake” that mismatch for the mypy check.

pradyunsg commented 3 years ago

For mypy, they both come from the exact same place: pip._vendor.packaging.version.Version in this repository. Whatever we do to convince mypy to prevent us from "mixing the same type" will be a hack, but... if it's not too horrible, color me interested! :)

eli-schwartz commented 3 years ago

The mismatch should be capable of being recreated by using one copy of the packaging module in its natural location and one copy that is imported directly from there via the setuptools.extern method.

Since that is how I originally detected the mismatch.

dstufft commented 3 years ago

I will say that while I generally lean on the side of wishing downstream would not debundle us, I do think it's not entirely a negative thing either. By having at least some downstream users that debundle, it sort of forces us to "stay honest".

By that I mean, I've seen time and time again that projects that bundle some library often times end up patching that library locally for one reason or another. Often times they start out assuming no local patches, but over time they get added, and eventually the bundled couple effectively ends up being a fork of upstream. Having downstreams that debundle forced us to come up with a bundling solution that goes out of it's way to try and avoid that from happening, and provide a constant sort of pressure to help ensure that we don't regress in that aspect. Maybe we don't need that pressure and it would be fine without it, but I do want to recognize that it does have at least some positive for upstream here.

That being said, I do think there are two general types of bugs that can flow from out debundling support. Broadly that's bugs that are inherent to it (for example, if we miss a vendoring alias) and bugs that are specific to something a downstream distro is doing (like only partially debundling, triggering a mismatched type problem).

The first of those types of bugs I think are obviously things we should land patches for in pip itself. It would be silly to expect every downstream to carry the same patch to fix some inherent problem in our debundling. Since we don't actually test our debundling, and we rely on downstream to do that (which is effectively the trade off we made here, we'll develop and carry this system of debundling, but we're pushing the costs of testing onto the downstream) so we'll often times only going to see those issues as reports (and hopefully patches) from downstream.

The second of those types of bugs I kind of mentally view them similar to things like patches that fix pip on obscure operating systems. We're unlikely to do the work to fix those problems or debug them, we're not going to add CI to ensure that it stays fixed, but if the patch itself isn't going to cause some serious regression and it's already written, then there is little reason to avoid pressing the merge button from our POV. Downstream might prefer to carry that patch themselves, since it's going to be more durable in that case (as was pointed out earlier, it's pretty easy for random work arounds to code paths not tested in CI to break), but they also might prefer to just land it in pip to avoid having to rebase their patch regularly. I think either option is fine.

In general though, I don't think the pip maintainers need to worry too much about fixing issues caused by specific downstream decisions that are not inherent to our debundling support. One of the reasons I made that support require downstreams to explicitly patch pip, was somewhat as a signal that by doing this, there's a chance you might have to carry patches to make it fully work.

pfmoore commented 3 years ago

I agree with pretty much everything @dstufft said, with the minor exception of a "human nature" qualification on one point:

there is little reason to avoid pressing the merge button from our POV

The reservation I have here is that if we don't push back on PRs that patch over obscure debundling problems that are outside what we'd consider the "norm", then that sets an expectation that we are willing to co-ordinate and manage the set of fixes that ends up in pip. And worse still, it leaves us open to the possibility that distribution A offers a PR that fixes their use case, but breaks distribution B. Who catches that problem?

In reality, we don't actually have that difficulty, but it's hard to know for sure to what extent that's because we're relatively conservative in what we accept. I guess as long as no-one wants us to start being more open to accepting fixes for debundling issues than we already are¹, then there's no problem.

¹ Note for context that the fix in #9467 has been merged.

stratakis commented 3 years ago

On Fedora and RHEL we don't debundle pip so far, so that issue wouldn't affect us much, but this is the case possibly because we never looked at debundling it. We lean towards debundling as a distribution though wherever possible despite the fact that sometimes it can be a bit of a hassle (as in the case of pipenv).

The reasons are nicely explained here: https://fedoraproject.org/wiki/Bundled_Libraries?rd=Packaging:Bundled_Libraries

However not debundling pip has caused us problems in the past, especially with libraries like urllib3 which bundle other packages as well, when e.g. we have to backport a CVE fix.

encukou commented 3 years ago

For the redistributors who are debundling pip, could you share with us the line of reasoning that leads you to decide that you have to be debundling pip?

Let me answer a bit more generally than what you asked for. Debundling helps the things that distros do:

These are much more visible in the more "enterprise"/LTS distros, and with packages that aren't maintained as enthusiastically as pip. Generally, projects lose attention from upstream developers in unpredictable ways and at unpredictable times, so some distros will try to make sure packages are "long-term maintainable" at all times. Having just one copy of each piece of code on the system helps that maintainability quite a lot. The general guidelines are inspired by all the times maintainers got burnt in the past.

Should pip's maintainers start saying no to taking on more technical debt?

Yes. Just say no; you set the rules. But, be aware that debundling helps maintainability in the long run. Testing code in more scenarios will uncover more issues, and some of them will be real. Even if it's just an incompatibility with an upcoming release of a dependency, which you'd find out about in a couple of weeks/months, the heads-up is, IMO, useful. (Or not? You set the rules!)

And as Donald says with "stay honest": I recommend to never give in to the temptation to fork/patch that bundled code. Otherwise you become maintainers of a fork, which is a whole new level of technical debt. More: bundling an older (unsupported or less-supported) version of a library will essentially also turn you into a fork maintainer: if there's a security update in a newer (incompatible) version, you'll need to rush to either restore compatibility or backport the fix.

All in all, I really hope that as distros, we're helping the project. Just in different ways than developers.

pfmoore commented 3 years ago

Let me answer a bit more generally than what you asked for

Thank you. This is useful input.

These are much more visible in the more "enterprise"/LTS distros, and with packages that aren't maintained as enthusiastically as pip.

One frustration here is that pip, having so few maintainers, really isn't able to address the sort of "enterprise" concerns that distros do. That's fine as long as the distros cover this, but when these concerns spill over onto pip, it can get difficult. Particularly as the distros get the enterprise license fees and funding, and we don't...

Yes. Just say no; you set the rules.

Thank you for that. It means a lot to get that support.

All in all, I really hope that as distros, we're helping the project. Just in different ways than developers.

Mostly, yes you are. Policy and priority clashes can be frustrating, and I won't lie, it's hard to be sympathetic when we get a bunch of users saying "pip is broken" but the reality is that what's broken is something the distro did. Some distros trigger more of these than others, and from an outsider's POV it can be hard to understand why that is, or why distros can be so different in this regard. But in general it's a net positive, yes.

encukou commented 3 years ago

Particularly as the distros get the enterprise license fees and funding, and we don't...

Yes. Sadly, I personally can't really help a lot with the politics involved :( I work at Red Hat, but I'm not a manager.

it's hard to be sympathetic when we get a bunch of users saying "pip is broken" but the reality is that what's broken is something the distro did.

Don't be ashamed to reassign the issue to the distros (however that might work). I definitely want to know about all the problems Fedora/RHEL is causing. This goes especially if it's the distros that have paid customer support (if resources based on customer demand, redirecting people to the will help point that money toward improving pip integration). That said, if you'd be OK doing a review every once in a while and clicking the merge button, or indeed do discussions like this, it could be better for the project overall.

pfmoore commented 3 years ago

Don't be ashamed to reassign the issue to the distros (however that might work).

We do, but only by saying "you need to talk to your distro" (which is all we know) and often it feels like the user has no clue how to do that, which is frustrating to us because the user reached out to us and we weren't able to help.

Hmm, one thought about how we could point people in the right direction more easily. Maybe we could get a list of the correct support URLs for all distros that debundle pip, and add them to the pip docs. We could also require that distros that debundle add a note to the pip version string saying "patched by XXX" (it would even be easy enough to add a check to pip so that we fail if we're debundled but the note is missing). Then it would be obvious from pip --version and the docs, where we should redirect users to.

One downside is that it may result in distros getting a whole load of pip issues that aren't related to debundling, but honestly I doubt that, I don't think many users do that much analysis before raising a bug (I know I don't 🙂).

FFY00 commented 3 years ago

We do, but only by saying "you need to talk to your distro" (which is all we know) and often it feels like the user has no clue how to do that, which is frustrating to us because the user reached out to us and we weren't able to help.

Speaking for myself, it is generally helpful if you ping people directly. Maybe pip could maintain a list of distro maintainers, maybe even other volunteers, willing to provide support directly in the bugtracker? I really think closing the communication gap between pip maintainers and distros is the best path forward.

pfmoore commented 3 years ago

Speaking for myself, it is generally helpful if you ping people directly.

If the issue isn't with pip, I'd really rather it got moved onto a distro tracker, and not have the discussion stay on the pip tracker (I have enough trouble already with too many pip notifications). I'm completely in agreement that better communication between distros and pip maintainers is good, but I also think that helping users to understand who is best placed to help them is good - and "@FFY00 on the pip tracker" looks to a user like a pip specialist, not a distro specialist, which IMO makes it harder to educate users.

pradyunsg commented 3 years ago

I think I’d prefer to have a blob of text that I can copy paste for saying “this seems to be due to XYZ distro’s changes. Here’s what you need to do to reach out to them”. Right now, we’re missing next steps guidance for the user, because we don’t know what they need to do to reach your communication channels.

If there’s a place we should send them to, that’s appropriate for you, it’d be great if you just provide it here. I’ll add that into the maintainer documentation when I come around to finishing our documentation rewrite. :)

stefanor commented 3 years ago

Sadly Debian's bug-reporting process is not very beginner-friendly (unless your beard is sufficiently grey and wispy, and you appreciate being able to file bugs with properly formatted text emails). Ubuntu's is less arcane and more web-based.

But here are Debian's bug reporting instructions: https://www.debian.org/Bugs/Reporting And Ubuntu's: https://help.ubuntu.com/community/ReportingBugs

How about this for Debian:

This issue looks like it's caused by changes that Debian made in their pip packaging. Please file a bug with Debian, with reportbug pthon3-pip Docs. You can link to this issue in your bug report.

In the meantime, you can probably work-around your issue by upgrading pip inside your virtualenv: python -m pip install -U pip

Ubuntu:

This issue looks like it's caused by changes that Ubuntu made in their pip packaging. Please file a bug with Ubuntu, with ubuntu-bug pthon3-pip Docs. You can link to this issue in your bug report.

In the meantime, you can probably work-around your issue by upgrading pip inside your virtualenv: python -m pip install -U pip

encukou commented 3 years ago

Fedora, RHEL, CentOS (and probably other derivatives – Rocky, Scientific, CloudLinux etc. – if they don't tell you something more specific):

This issue looks like it's caused by changes that Fedora or Red Hat made in their pip packaging. Please file a Fedora bug at https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=python-pip

cc @encukou @hroncok

pradyunsg commented 3 years ago

Thanks for the discussion everyone.

I think the next steps here are for someone to file a PR aggregating the comments into a dev-docs page to copy from. This will likely need to be one of the pip maintainers, since we’d want to word things carefully there.

merwok commented 2 years ago

FTR Stefano Rivera sent this message about stopping the de-bundling in Debian: https://lists.debian.org/debian-python/2021/09/msg00031.html

pradyunsg commented 2 years ago

@eli-schwartz @FFY00 Do you want to share an equivalent blurb for the distros you're involved with, as was shared above for Fedora+"friends" and Debian+"friends"?

FFY00 commented 2 years ago

We will almost certainly keep debundling in arch. There are some issues that we need to be careful, mainly making sure we don't update dependencies to incompatible versions and stuff like that. Perhaps @felixonmars could elaborate a bit more, since he is the one that currently maintains the pip package.

pradyunsg commented 2 years ago

That's not what I'm asking. See https://github.com/pypa/pip/issues/9677#issuecomment-790742877 for what I'm asking for here.

FFY00 commented 2 years ago

Ah, sorry! Just point the users to the following URL, to create a new issue in our bug tracker.

https://bugs.archlinux.org/newtask?project=1&product_category=2&item_summary=%5Bpython-pip%5D+PLEASE+ENTER+SUMMARY

pradyunsg commented 2 years ago

Ok, I think Arch is the last remaining holdout on debundling. And, your current approach is causing a significantly degraded/fragile experience when using the Arch-provided pip: https://twitter.com/jpetazzo/status/1556594507952553984

pradyunsg commented 2 years ago

And... @dvzrv removed Arch's patch to debundle pip in the latest release of python-pip (22.2.2-2, love this version number). More context in #11411.

Closing this out, since... uhm... Looks like every distro that we've seen substantial reports from, in the past, has stopped debundling pip. If you do end up going down the road of debundling in the future, please stick to the description in the policy (specifically, the bit about making sure stuff doesn't break). Also, please feel welcome to reach out to us (directly over email, via an issue here, over IRC, the PyPA Discord or on discuss.python.org's Packaging category) if you see something that you'd like our input on.

dvzrv commented 2 years ago

Although we have stopped debundling, also with that we do have issues: e.g. with the bundled certifi, which bundles a specific certificate, which we usually point at our system-wide certificate setup (one place to configure things is great).

uranusjr commented 2 years ago

pip also patches certifi so it’d probably be fairly reliable to patch pip’s bundled certifi.