pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.52k stars 3.03k forks source link

pip install nondeterministic order even for explicitly provided wheels #11572

Open vanschelven opened 2 years ago

vanschelven commented 2 years ago

Description

Subsequent calls to pip install do not execute in the same order, even when wheels are vendored, and no index is used.

This seems unnecessarily nondeterministic (to me), and makes it harder than necessary to reproduce bugs (including bugs in pip).

Expected behavior

No response

pip version

22.0.2

Python version

Ptyhon 3.10

OS

any (presumably); in practice: ubuntu

How to Reproduce

(areyoudeterministic) me:/tmp/areyoudeterministic$ cat requirements.txt 
Django==4.0.*
pytz==2022.1
whitenoise==6.0.0
sqlparse==0.4.2
cfenv==0.5.3
Authlib==1.0.1
requests==2.27.1
djangorestframework==3.13.1
djangorestframework-datatables==0.7.0
cryptography
pandas==1.5.0
reportlab==3.6.11
gunicorn==20.1.0
mysqlclient==2.1.0
django-migration-linter
cfenv==0.5.3
django-auth-ldap==4.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip wheel --no-binary=:none: -r requirements.txt -w vendor
Collecting Django==4.0.*
  Using cached Django-4.0.8-py3-none-any.whl (8.0 MB)
[..]
Saved ./vendor/setuptools-65.5.0-py3-none-any.whl

(areyoudeterministic) me:/tmp/areyoudeterministic$ python -m venv .

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip install -r requirements.txt --find-links=vendor --no-index
Looking in links: vendor
Processing ./vendor/Django-4.0.8-py3-none-any.whl
Processing ./vendor/pytz-2022.1-py2.py3-none-any.whl
Processing ./vendor/whitenoise-6.0.0-py3-none-any.whl
Processing ./vendor/sqlparse-0.4.2-py3-none-any.whl
Processing ./vendor/cfenv-0.5.3-py2.py3-none-any.whl
Processing ./vendor/Authlib-1.0.1-py2.py3-none-any.whl
Processing ./vendor/requests-2.27.1-py2.py3-none-any.whl
Processing ./vendor/djangorestframework-3.13.1-py3-none-any.whl
Processing ./vendor/djangorestframework_datatables-0.7.0-py2.py3-none-any.whl
Processing ./vendor/cryptography-38.0.3-cp36-abi3-manylinux_2_28_x86_64.whl
Processing ./vendor/pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/reportlab-3.6.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/gunicorn-20.1.0-py3-none-any.whl
Processing ./vendor/mysqlclient-2.1.0-cp310-cp310-linux_x86_64.whl
Processing ./vendor/django_migration_linter-4.1.0-py3-none-any.whl
Processing ./vendor/django_auth_ldap-4.0.0-py3-none-any.whl
Processing ./vendor/asgiref-3.5.2-py3-none-any.whl
Processing ./vendor/furl-2.1.3-py2.py3-none-any.whl
Processing ./vendor/idna-3.4-py3-none-any.whl
Processing ./vendor/urllib3-1.26.12-py2.py3-none-any.whl
Processing ./vendor/certifi-2022.9.24-py3-none-any.whl
Processing ./vendor/charset_normalizer-2.0.12-py3-none-any.whl
Processing ./vendor/python_dateutil-2.8.2-py2.py3-none-any.whl
Processing ./vendor/numpy-1.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/Pillow-9.3.0-cp310-cp310-manylinux_2_28_x86_64.whl
Requirement already satisfied: setuptools>=3.0 in ./lib/python3.10/site-packages (from gunicorn==20.1.0->-r requirements.txt (line 13)) (59.6.0)
Processing ./vendor/python_ldap-3.4.3-cp310-cp310-linux_x86_64.whl
Processing ./vendor/cffi-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/toml-0.10.2-py2.py3-none-any.whl
Processing ./vendor/appdirs-1.4.4-py2.py3-none-any.whl
Processing ./vendor/pycparser-2.21-py2.py3-none-any.whl
Processing ./vendor/six-1.16.0-py2.py3-none-any.whl
Processing ./vendor/orderedmultidict-1.0.1-py2.py3-none-any.whl
Processing ./vendor/pyasn1_modules-0.2.8-py2.py3-none-any.whl
Processing ./vendor/pyasn1-0.4.8-py2.py3-none-any.whl
Installing collected packages: pytz, pyasn1, appdirs, whitenoise, urllib3, toml, sqlparse, six, pycparser, pyasn1-modules, pillow, numpy, mysqlclient, idna, gunicorn, charset-normalizer, certifi, asgiref, requests, reportlab, python-ldap, python-dateutil, orderedmultidict, Django, cffi, pandas, furl, djangorestframework, django-migration-linter, django-auth-ldap, cryptography, djangorestframework-datatables, cfenv, Authlib
Successfully installed Authlib-1.0.1 Django-4.0.8 appdirs-1.4.4 asgiref-3.5.2 certifi-2022.9.24 cfenv-0.5.3 cffi-1.15.1 charset-normalizer-2.0.12 cryptography-38.0.3 django-auth-ldap-4.0.0 django-migration-linter-4.1.0 djangorestframework-3.13.1 djangorestframework-datatables-0.7.0 furl-2.1.3 gunicorn-20.1.0 idna-3.4 mysqlclient-2.1.0 numpy-1.23.4 orderedmultidict-1.0.1 pandas-1.5.0 pillow-9.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 python-dateutil-2.8.2 python-ldap-3.4.3 pytz-2022.1 reportlab-3.6.11 requests-2.27.1 six-1.16.0 sqlparse-0.4.2 toml-0.10.2 urllib3-1.26.12 whitenoise-6.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ rm bin/ include/ lib lib64/ -rf
(areyoudeterministic) me:/tmp/areyoudeterministic$ python -m venv .
(areyoudeterministic) me:/tmp/areyoudeterministic$ pip install -r requirements.txt --find-links=vendor --no-index
Looking in links: vendor
Processing ./vendor/Django-4.0.8-py3-none-any.whl
Processing ./vendor/pytz-2022.1-py2.py3-none-any.whl
Processing ./vendor/whitenoise-6.0.0-py3-none-any.whl
Processing ./vendor/sqlparse-0.4.2-py3-none-any.whl
Processing ./vendor/cfenv-0.5.3-py2.py3-none-any.whl
Processing ./vendor/Authlib-1.0.1-py2.py3-none-any.whl
Processing ./vendor/requests-2.27.1-py2.py3-none-any.whl
Processing ./vendor/djangorestframework-3.13.1-py3-none-any.whl
Processing ./vendor/djangorestframework_datatables-0.7.0-py2.py3-none-any.whl
Processing ./vendor/cryptography-38.0.3-cp36-abi3-manylinux_2_28_x86_64.whl
Processing ./vendor/pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/reportlab-3.6.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/gunicorn-20.1.0-py3-none-any.whl
Processing ./vendor/mysqlclient-2.1.0-cp310-cp310-linux_x86_64.whl
Processing ./vendor/django_migration_linter-4.1.0-py3-none-any.whl
Processing ./vendor/django_auth_ldap-4.0.0-py3-none-any.whl
Processing ./vendor/asgiref-3.5.2-py3-none-any.whl
Processing ./vendor/furl-2.1.3-py2.py3-none-any.whl
Processing ./vendor/charset_normalizer-2.0.12-py3-none-any.whl
Processing ./vendor/idna-3.4-py3-none-any.whl
Processing ./vendor/urllib3-1.26.12-py2.py3-none-any.whl
Processing ./vendor/certifi-2022.9.24-py3-none-any.whl
Processing ./vendor/numpy-1.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/python_dateutil-2.8.2-py2.py3-none-any.whl
Processing ./vendor/Pillow-9.3.0-cp310-cp310-manylinux_2_28_x86_64.whl
Requirement already satisfied: setuptools>=3.0 in ./lib/python3.10/site-packages (from gunicorn==20.1.0->-r requirements.txt (line 13)) (59.6.0)
Processing ./vendor/python_ldap-3.4.3-cp310-cp310-linux_x86_64.whl
Processing ./vendor/cffi-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/appdirs-1.4.4-py2.py3-none-any.whl
Processing ./vendor/toml-0.10.2-py2.py3-none-any.whl
Processing ./vendor/pycparser-2.21-py2.py3-none-any.whl
Processing ./vendor/six-1.16.0-py2.py3-none-any.whl
Processing ./vendor/orderedmultidict-1.0.1-py2.py3-none-any.whl
Processing ./vendor/pyasn1-0.4.8-py2.py3-none-any.whl
Processing ./vendor/pyasn1_modules-0.2.8-py2.py3-none-any.whl
Installing collected packages: pytz, pyasn1, appdirs, whitenoise, urllib3, toml, sqlparse, six, pycparser, pyasn1-modules, pillow, numpy, mysqlclient, idna, gunicorn, charset-normalizer, certifi, asgiref, requests, reportlab, python-ldap, python-dateutil, orderedmultidict, Django, cffi, pandas, furl, djangorestframework, django-migration-linter, django-auth-ldap, cryptography, djangorestframework-datatables, cfenv, Authlib
Successfully installed Authlib-1.0.1 Django-4.0.8 appdirs-1.4.4 asgiref-3.5.2 certifi-2022.9.24 cfenv-0.5.3 cffi-1.15.1 charset-normalizer-2.0.12 cryptography-38.0.3 django-auth-ldap-4.0.0 django-migration-linter-4.1.0 djangorestframework-3.13.1 djangorestframework-datatables-0.7.0 furl-2.1.3 gunicorn-20.1.0 idna-3.4 mysqlclient-2.1.0 numpy-1.23.4 orderedmultidict-1.0.1 pandas-1.5.0 pillow-9.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 python-dateutil-2.8.2 python-ldap-3.4.3 pytz-2022.1 reportlab-3.6.11 requests-2.27.1 six-1.16.0 sqlparse-0.4.2 toml-0.10.2 urllib3-1.26.12 whitenoise-6.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip --version
pip 22.0.2 from /tmp/areyoudeterministic/lib/python3.10/site-packages/pip (python 3.10)

Note the positioning of furl-2.1.3-py2.py3-none-any.whl and idna-3.4-py3-none-any.whl in the 2 subsequent runs.

Output

No response

Code of Conduct

uranusjr commented 2 years ago

Before thinking about what may cause this, can I ask first why this is considered a problem? Do you have concrete examples this cause difficulties?

vanschelven commented 2 years ago

As per the linked issue (about INFO spam): to reproduce that particular issue it would be nice if reproducing it once meant reproducing it always.

The more important case (for me) was the thing I was actually working on while running into the mentioned issue: when debugging a problem with a pip install invocation in some CI/CD pipeline, I was trying to zoom in on differences as one often does while debugging. That often means having a "known good" and "known bad" situation, comparing them, and trying to step-wise bring them closer and closer together while comparing outputs. In such a scenario it is very unhelpful if the outputs change all the time.

NB in the example above the differences are trivial (ordering of successful operations) but in the interesting case (failures) the differences may be more pronounced (i.e. more confusing)

vanschelven commented 2 years ago

what may cause this

The liberal use of set() and frozenset() throughout the codebase come to mind... especially as part of the dependency resolution.

pfmoore commented 2 years ago

We’re not going to prohibit the use of sets/frozensets…

vanschelven commented 2 years ago

We’re not going to prohibit the use of sets/frozensets…

Why not? Drop-in replacements which preserve order-of-adding would seem to be easy enough to add?

pfmoore commented 2 years ago

We don't have the interest or bandwidth to maintain (or vendor) an ordered set implementation when the stdlib supplies data structures that do what we need. IMO this issue isn't significant enough to justify that sort of maintenance overhead.

vanschelven commented 2 years ago

I was thinking of contributing it myself TBH but it has become clear to me that this is not something that you (plural?) would appreciate

On Fri, Nov 4, 2022, 11:38 Paul Moore @.***> wrote:

We don't have the interest or bandwidth to maintain (or vendor) an ordered set implementation when the stdlib supplies data structures that do what we need. IMO this issue isn't significant enough to justify that sort of maintenance overhead.

— Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/11572#issuecomment-1303240090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWUWK6WK6WN2FVO3Y6RGDWGTRSJANCNFSM6AAAAAARWBOOAA . You are receiving this because you authored the thread.[image: Web Bug from https://github.com/notifications/beacon/AABWUWJ6T5WBZUDW6HUWTJ3WGTRSJA5CNFSM6AAAAAARWBOOACWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSNVXOZU.gif]Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": " https://github.com/pypa/pip/issues/11572#issuecomment-1303240090", "url": "https://github.com/pypa/pip/issues/11572#issuecomment-1303240090", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": " https://github.com" } } ]

pfmoore commented 2 years ago

It's my personal opinion, feel free to wait to see what the other pip maintainers think if you want. But it's not so much the initial contribution of the code that matters to me, it's the ongoing issue that we'd need to remember to not use sets anywhere in the codebase in future, just in case it results in nondeterministic behaviour. Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.

I just don't think the issue is serious enough to be worth working through all the implications involved in fixing it.

notatallshaw commented 2 years ago

Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?

This already happens when choosing what package to backtrack on.

I'm sure then it would be possible to add a test that confirms this order is followed?

pfmoore commented 2 years ago

Agreed, something like that sounds far more plausible (although given that the reported issue was with the ordering of the "processing..." lines, I don't think that would help in the OP's case, as that would require changing the order of processing, not just the order of reporting).

notatallshaw commented 2 years ago

Agreed my wording wasn't quite right, I meant whenever iterating through any collection that would affected the user.

With regards to the symptoms of this issue I have reproduced it myself just by running the same command several times:

python -m pip download -r requirements.txt -d downloads

It seems to me the top level packages are resolved in user order and the order of transitive dependencies is what can change. I'm trying to think of hypothetical situations where this could have significant user impact:

But I've never seen a user report of this so it seems like it isn't easily triggered if it is possible.

vanschelven commented 2 years ago

an ordered set implementation

In my naive mind any ordered dict could be used (e.g. storing None for all keys), i.e. in Python versions relevant to pip: a dict. but perhaps I'm missing something specific to pip that makes this hard.

Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.

In my experience testing deterministic code is much simpler than non-deterministic code (precisely because tests can rely on various orderings) both for happy paths and for pinpointing bugs (which may disappear on the next run in non-deterministic code). And testing that the code behaves deterministically basically comes for free, because you're very likely to start relying on the deterministic behaviors in your tests expectations.

In fact, such automatic reliance on deterministic behavior is so automatic, that if there's any arguments to be made against deterministic behavior in the context of testing, they would be quite opposite to the one you just made. Namely: [1] that in practice tests will start relying so much on deterministic behaviors that are in fact implementation details, that this makes refactoring of such implementation details harder and [2] that the fact that your code behaves more uniformly may obscure some cases of failures that would be easier to detect in a randomly behaving code-base. Still, in my mind these disadvantages are minor in comparison to the advantage of your code always behaving in the same way. If these were strong arguments, one would introduce randomness in as many locations as possible, and that's not something we tend to do for obvious reasons.

Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?

If the iterating itself does not have side-effects, this would indeed be simpler. I do seem to remember, however, that in some cases iterators were specifically introduced in the pip code-base to defer side-effects, so I'm not 100% sure this is possible (or at least: easier)

pradyunsg commented 2 years ago

Instead of getting back into an extensive discussion about the nature of determistic vs non-deterministic order in data structures... I reckon it'd be a better investment of effort to investigate the specific cause of this issue.

vanschelven commented 2 years ago

investigate the specific cause of this issue

If curiosity and spare time come together I might just do that