pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.54k stars 1.19k forks source link

Setuptools and pip manage vendored namespace incompatibly #2052

Closed kitterma closed 3 years ago

kitterma commented 4 years ago

In Debian we've been having an issue for some time where version comparisons where one comes from pip and the other comes from setuptools are incompatible [1]. Today @dstufft helped me look into it and determined that the two tools have incompatible approaches to vendoring that are the source of this problem. Based on that insight (and very specific help, again from @dstufft ), I came up with a hack to work-around the problem for now [2].

It's certainly not ideal, but I think it demonstrates the problem. To quote him from IRC earlier today:

what pip does is basically loop over all of our vendored modules, manually import them, and then munge sys.modules so that like, pip._vendor.packaging.version is the same object you would get back when you do import packaging.version. setuptools does it differently, they install a meta importer, that at the import level does the aliasing that effectively means that for pip, there is only one packaging module imported both at packaging and pip._vendor.packaging, the pip._vendor.packaging is just a tiny shim, whereas for setuptools packaging and pkg_resourcex.extern.packaging are two fully unique objects it's my opinion this is an upstream bug, either in pip or setuptools (or likely both) and we're going to need to coordinate to ensure that both of our debundling strategies can co-exist, because they currently can't

Thanks for considering,

Scott K

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912379 [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=912379;filename=pip.versions.patch;msg=39

dstufft commented 4 years ago

FWIW, pip used to do it the same way that setuptools did it here, and we switched away from doing it like 5 years ago. Unfortunately I don't have a lot of memory of why that was done, but that was done in https://github.com/pypa/pip/pull/2925.

I have the vaguest memory of the problem basically being that in cases where our vendored libraries had import errors on their own, we would hide those import errors, thus making it more difficult for end users to diagnose a broken install, even i they weren't actually using the bundling.

The way the de-bundling logic works now, is that it's entirely a no-op for everyone using pip, HOWEVER, if a downstream (such as debian) does debundle, they're expected to carry a small patch that effectively just toggles a constant from False to True, and then the debundling logic kicks in, which makes it impossible for the debundling logic to cause issues for installs that weren't debundled.

Given pip used to have a mechanism similiar to setuptools and we moved away from it, I think it makes sense to port pip's logic over into setuptools here-- but I'm not personally 100% tied to that if the setuptools devs feel strongly.

Tagging @pypa/pip-committers because this is a cross cutting issue that only really becomes a problem with the interaction between our two projects.

pganssle commented 4 years ago

I don't actually understand the problem here. Why do these vendoring mechanisms need to be compatible? Is there a simple minimal example that demonstrates a problem it could cause?

dstufft commented 4 years ago

They need to be compatible because we're both using packaging.version.Version and when you try to compare a packaging.version.Version that comes from inside of pip with one that comes from inside of setuptools (well pkg_resources) it fails, because it thinks they're two unrelated classes that you're trying to compare.

It works inside of pip natively, because pip's bundling of setuptools rewrites the imports inside of setuptools to pull from pip._vendor.packaging instead of pkg_resources.extern.packaging.

However, when both pip and setuptools are running in a debundled state pip's debundling logic means that import pip._vendor.packaging is actually the same object as import packaging (and if you look at like, __class__ and stuff on the objects inside of it, you'll se the qual name makes no mention of pip). Whereas setuptools views them as two entirely distinct namespaces (and actually, if pip adopted setuptools style debundling support actually, we'd have 3 distinct namespaces, so that wouldn't even work..).

So effectively, there are 2 real options:

  1. setuptools adopts a pip style debundling logic, which means that when debundled packaging.version.Version, pip._vendor.packaging.version.Version, and pkg_resources.extern.packaging.version.Version will all be the exact same object (and that object will be packaging.version.Version.
  2. Setuptools declares what pip is doing as unsupported, and pip has to go and find anyplace we comingle pip._vendor.packaging.version.Version and pkg_resources.extern.packaging.version.Version (really it's pkg_resources.SetuptoolsVersion) and likely "casting" any setuptools version to our copy of packaging's Version by doing like. version = pip._vendor.packaging.version.parse(str(dist.parsed_version)), which seems wasteful to me?
dstufft commented 4 years ago

Also you can repro in any place where both pip and setuptools are debundled (Debian unstable docker container is what I'm using) with:

import pip._vendor.packaging.version as pv
import pip._vendor.pkg_resources.extern.packaging.version as sv

pv.parse('1.0') >= sv.parse('1.0')

Which gives an error like:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: '>=' not supported between instances of 'Version' and 'Version'
pganssle commented 4 years ago

Hm, my inclination here is to suggest that we not support this sort of "debundling" configuration, and leave it to Debian or whoever wants to "debundle" to get it working. We barely have the resources to support the standard configuration here, so I'm not super in favor of supporting unusual configurations with regards to dependencies. That said, that might be tinged with a bit of bias because I really dislike the way Debian modifies packages to separate out python and python-pip, python-venv, etc.

On the other hand, from what I can we are not vendoring our dependencies in an attempt to pin them to a specific version for compatibility purposes - we're vendoring them because it's too hard to solve the bootstrapping problem (I believe @jaraco has more insight into this), so I suppose it isn't the worst thing in the world for us to make it easier for people who want to solve that bootstrapping problem a different way.

With regards to compatibility issues, I think the answer here may depend on why pip and setuptools are vendoring in the first place. If the answer is just bootstrapping and not pinning, then it maybe shouldn't be a problem for them to hit the same actual module objects (though mucking around with sys.modules sounds much more likely to cause problems than the approach setuptools is taking IMO), but if any of the packages in this cluster are vendoring because they need to pin specific versions and there is a version conflict, the "incompatible" solution that setuptools uses will solve that problem, whereas the "compatible" solution won't.

One way to cut the Gordian knot in this situation might be to change packaging.Version's comparison operations to depend on a protocol rather than isinstance checks, in which case the two different versions of packaging.Version can be compared without an issue, but I assume that this was just offered as an exemplar of the wider problem and not the major item of concern here.

dstufft commented 4 years ago

I think the answer here may depend on why pip and setuptools are vendoring in the first place.

I do not think that in either case we are vendoring to pin version, just to solve the bootstrapping problem yea. At least that's true in pip's case, and my memory of setuptools is that they did it for the same reason but my memory could be faulty.

though mucking around with sys.modules sounds much more likely to cause problems than the approach setuptools is taking IMO

Well, pip took a very similar mechanism at first and ultimately changed it due to problems (which affected people who weren't even using a debundled configuration), and since then the only problem I've seen come out of it has been in it's interaction with setuptools debundling.

Trying to dig through the old issues, I suspect that the problems pip ran into wouldn't effect setuptools in it's current state, but that's mostly an accident of what packages setuptools is bundling. The fundamental problem that pip ran into, is the setuptools style shim, is that it couldn't determine the difference between an ImportError because the package wasn't bundled anymore and an ImportError because someone's Python wasn't functional for us. The specific issue pip ran into, was someone had a Python built without ssl support, so trying to import ssl rose an ImportError for _ssl, when we attempted to import pip._vendor.requests, we got an ImportError, so we swallowed that ImportError (as setuptools does as well), and moved on to trying to import from the unbundled package, which also got an ImportError, so we just rose an ImportError saying that pip._vendor.requests couldn't be imported, with no additional information.

All of that is true with setuptools style debundling, and importantly this error was happening to people who were not debundling, all it took to trigger this edge case was one of the vendored libraries throwing an ImportError for something it expected to exist outside of itself (typically the stdlib). The pip style approach is very specific in that it does not ever swallow an ImportError or replace it with something other than the actual expected ImportError-- infact it entirely a no-op for people who are not patching pip to remove the bundled dependencies, because the entire debundle logic is hidden behind a conditional that always evaluates to True, so for it to ever have an effect on end users, they would have to modify the installed package.

More importantly, I don't think two instances of the setuptools style debundling can work together as suggested by this issue, because the fundamental problem here is that we're getting a version from inside setuptools and one from not inside setuptools and those being seen as two entirely different objects, having pip move to setuptools style debundling wouldn't change that situation at all, it'd still be two entirely different classes as far as Python is concerned.

if any of the packages in this cluster are vendoring because they need to pin specific versions and there is a version conflict, the "incompatible" solution that setuptools uses will solve that problem, whereas the "compatible" solution won't.

I don't think this is a major concern, downstream already have to ensure that a single version works for all of the packages they make available since Python's import system doesn't allow multiple versions like that (and afaik, pip is the only project that gets special treatment here that has a distro effectively re-bundle to where they conceivably could do that so it wouldn't affect setuptools anyways). I suspect that, worst case scenario, if say setuptools depended on a version of a dependency that was just fundamentally incompatible with the rest of the single set of versions they make available in the archive, they would either patch setuptools to support the version they do provide OR simply not unbundle that particular dependency.

I don't think that matters though, because AFAIK none of us are actually using bundling to pin to specific versions due to incompatibility but only to solve the bootstrap problem (and for pip specifically, to ensure you can't get your system in a broken state where you pip uninstall packaging or something).

One way to cut the Gordian knot in this situation might be to change packaging.Version's comparison operations to depend on a protocol rather than isinstance checks, in which case the two different versions of packaging.Version can be compared without an issue, but I assume that this was just offered as an exemplar of the wider problem and not the major item of concern here.

As far as I know, this is the only place that this currently affects, though the problem could certainly exist elsewhere that we just haven't uncovered yet. Personally, since this only affects debundling, I wouldn't change the API of packaging at all, if setuptools ultimately decides they'll keep their vendoring solution, I would just adjust pip to ensure that anytime we get a Version object from setuptools, that we translate it to a "real" packaging.version.Version object by roundtripping it through str(). That seems making things harder though for something that only really matters in one specific case.

xavfernandez commented 4 years ago

To add some context, pip vendoring mechanism is here while setuptools' one is here.

I think the answer here may depend on why pip and setuptools are vendoring in the first place.

(and for pip specifically, to ensure you can't get your system in a broken state where you pip uninstall packaging or something).

From my point of view, this is the main reason pip vendors its dependencies: to prevent the user from easily breaking pip and getting stuck.

What would prevent setuptools to switch to "pip-style" vendoring ?

pradyunsg commented 4 years ago

What would prevent setuptools to switch to "pip-style" vendoring ?

A gentle nudge on this.

jaraco commented 4 years ago

I’m not opposed to setuptools adopting pips technique. The motivation for the meta importer was to make it trivially easy for a downstream packager to debundle (install vendored packages, delete _vendor dir).

Another requirement I have is that the tool chains work on all major platforms and doesn’t depend on bash or make or other Windows-hostile features. The vendoring package services this need nicely.

I would be inclined to accept a PR that adapts setuptools bundling you use pips technique.

jaraco commented 4 years ago

I’m also highly interested in removing vendoring in setuptools. It feels like to me there should be at most one tool in the python ecosystem that requires specialized bootstrapping. I’ve seen some signs that downstream packagers may be able to move to PEP 517 and pip to build setuptools from source for package manager. I’d love for this to become the standard op procedure so that setuptools could drop vendored dependencies altogether.

eli-schwartz commented 3 years ago

This is a duplicate of #1383

The problem isn't due to pip + setuptools, it's a general setuptools problem as I demoed there.

I also have vague memories that some other project using pkg_resources.parse_version() and packaging.version.Version() was broken for this reason, but I can't remember where this vague memory comes from. Given there are projects that try to be helpful and make their modulename.__version__ attribute be a Version instance out of the box (especially when they already depend on or assume setuptools is installed, whether for entry point plugins or because historically setup.py entry points imported pkg_resources, or even more excitingly, if they try importing packaging and fall back on importing pkg_resources.extern.packaging), it really is quite unsafe IMO to assume you know which kind of Version() it is across project boundaries.

This is why my bug report referenced pip a lot in the report, but also took pains to point out the comparison to using packaging.version.Version as a third point of reference.

jaraco commented 3 years ago

Thanks @eli-schwartz. I'll close it as duplicate of #1383, but I'll flag this issue as having a lot of useful context.

FWIW, Setuptools is now discouraging the use of pkg_resources in almost every case in favor of importlib.{metadata,resources}. The __version__ could (and should) be serviced by importlib.metadata, although I've been discouraging use of this pattern and instead advising packages to remove this attribute in favor of having the consumer just call importlib.metadata.version() directly. In either case, use of packaging becomes explicit downstream (importlib.metadata doesn't provide any Version objects).