pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.45k stars 3k forks source link

Debug output shows wrong versions when debundled #8327

Open kitterma opened 4 years ago

kitterma commented 4 years ago

Environment

The most important point is that Debian unbundles the pip depends and creates their own wheels from their archive due to Debian policy. As a result, the versions are not consistently the same as those in vendor.txt. Additionally, Debian doesn't include vendor.txt in it's package since it's not used.

Description https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961540

The debug command assumes that vendor.txt is present (reasonable), but does not consider that the versions in vendor.txt may not be the ones used when the vendored modules are debundled.

Expected behavior The debug command should show the module versions actually being used. Additionally, it would be useful to show where they are located on the system (the fact that they are debundled is already shown).

How to Reproduce

  1. Get package from Debian testing or unstable
  2. Then run pip debug
  3. An error occurs.

See the linked Debian bug.

I have a proposed change to resolve this. I know pip's support of debundling is shallow, but I think this would improve it, so I'd prefer it go upstream and not just carry it as a Debian patch. PR to follow.

pradyunsg commented 4 years ago

Sounds good to me in principle! ^.^

kitterma commented 4 years ago

Great. I'll update it after I beat flake8 into submission.

kitterma commented 4 years ago

Here's what the added changed/added output looks like: pip._vendor.WHEEL_DIR: /usr/share/python-wheels debundled wheel versions: CacheControl==0.12.6 appdirs==1.4.3 certifi==2020.4.5.1 chardet==3.0.4 colorama==0.4.3 contextlib2==0.6.0 distlib==0.3.0 distro==1.5.0 html5lib==1.0.1 idna==2.9 ipaddr==2.2.0 lockfile==0.12.2 msgpack==0.6.2 packaging==20.3 pep517==0.8.2 pip==20.1.1 pkg_resources==0.0.0 progress==1.5 pyparsing==2.4.7 requests==2.23.0 resolvelib==0.3.0 retrying==1.3.3 setuptools==44.0.0 six==1.14.0 toml==0.10.1 urllib3==1.25.9 webencodings==0.5.1 wheel==0.34.2

xavfernandez commented 4 years ago

The point of using vendor.txt was precisely for the case of debundled pip. The goal was to ensure that the version actually matched. Why doesn't Debian include the vendor.txt in the installation ?

kitterma commented 4 years ago

Because I didn't perceive it had any value. We can include it if needed.

My understanding was that vendor.txt was meant to document the bundled versions, but that it didn't mean anything about what versions pip was compatible with.

It seemed to me that for debug output you would want to know what versions are being used, not what versions were bundled (you don't need debug output for that).

xavfernandez commented 4 years ago

Well pip is tested & expected to work with the versions specified in vendor.txt.

And since some distributions debundle those dependencies it is useful to quickly check that the debundled libraries' versions match those expected by pip, hence the addition to pip debug.

The command first tries to identify the version of the library via: https://github.com/pypa/pip/blob/cdeb377a3633c9a79e4bd08f5e0026c8be7a7143/src/pip/_internal/commands/debug.py#L82-L95 (and this could actually be improved to fallback on wheel filenames like in your PR).

Then it compares the found version to the one present in vendor.txt: https://github.com/pypa/pip/blob/cdeb377a3633c9a79e4bd08f5e0026c8be7a7143/src/pip/_internal/commands/debug.py#L105-L112 to highlight possible conflicts.

It is mainly useful on distributions like Archlinux that do not rely on the WHEEL_DIR debundling strategy but instead use the libraries installed alongside pip (cf https://www.archlinux.org/packages/extra/any/python-pip/): any wild sudo pip install <one_of_pip_dependency> is very likely to cause issue.

Hence the need for vendor.txt to be there to easily identify conflicts.

kitterma commented 4 years ago

OK. FWIW, I think conflict is a strong word. They are different. I think conflict implies a problem and you can't tell that from version numbers alone.

I think when I was looking through this before I didn't follow it all the way through to see what it was adding.

Let's assume for a moment that I arrange to put vendor.txt back in the Debian package, so that part of the problem goes away (that's trivial to do and I will do so), what from this PR would be useful to capture?

I think if things are debundled, it is useful to say where the unbundled wheels are located. Is WHEEL_DIR unset on Archlinux?

For finding the version, where would using wheel filenames fall in the hierarchy of figuring out the version?

xavfernandez commented 4 years ago

what from this PR would be useful to capture?

For finding the version, where would using wheel filenames fall in the hierarchy of figuring out the version?

Nevermind, if https://github.com/pypa/pip/blob/cdeb377a3633c9a79e4bd08f5e0026c8be7a7143/src/pip/_internal/commands/debug.py#L87-L93 works, no need to check wheel filenames.

I think if things are debundled, it is useful to say where the unbundled wheels are located. Is WHEEL_DIR unset on Archlinux?

It isn't set and use the default value where no wheels are found:

$ python -s -c "import pip._vendor;print(pip._vendor.WHEEL_DIR)"
/usr/lib/python3.8/site-packages/pip/_vendor
$ ls /usr/lib/python3.8/site-packages/pip/_vendor
__init__.py  __pycache__

pip dependencies are instead installed globally:

$ python -s -c "import pip._vendor.requests;print(pip._vendor.requests.__path__)"
['/usr/lib/python3.8/site-packages/requests']
kitterma commented 4 years ago

BTW, as a further answer to why we don't install vendor.txt, the debundling instructions provided by pip upstream say not to:

https://pip.pypa.io/en/stable/development/vendoring-policy/#debundling

kitterma commented 4 years ago

I just submitted https://github.com/pypa/pip/pull/8436 to fix that.

kitterma commented 4 years ago

I've tested out how things work with vendor.txt installed and there are issues around ipaddress (which isn't needed for python3). I haven't worked through what I think the best solution is. We provide an ipaddr wheel for use with python2 and the std library ipaddress gets used with python3.