Open vanschelven opened 2 years ago
Before thinking about what may cause this, can I ask first why this is considered a problem? Do you have concrete examples this cause difficulties?
As per the linked issue (about INFO spam): to reproduce that particular issue it would be nice if reproducing it once meant reproducing it always.
The more important case (for me) was the thing I was actually working on while running into the mentioned issue: when debugging a problem with a pip install
invocation in some CI/CD pipeline, I was trying to zoom in on differences as one often does while debugging. That often means having a "known good" and "known bad" situation, comparing them, and trying to step-wise bring them closer and closer together while comparing outputs. In such a scenario it is very unhelpful if the outputs change all the time.
NB in the example above the differences are trivial (ordering of successful operations) but in the interesting case (failures) the differences may be more pronounced (i.e. more confusing)
what may cause this
The liberal use of set()
and frozenset()
throughout the codebase come to mind... especially as part of the dependency resolution.
We’re not going to prohibit the use of sets/frozensets…
We’re not going to prohibit the use of sets/frozensets…
Why not? Drop-in replacements which preserve order-of-adding would seem to be easy enough to add?
We don't have the interest or bandwidth to maintain (or vendor) an ordered set implementation when the stdlib supplies data structures that do what we need. IMO this issue isn't significant enough to justify that sort of maintenance overhead.
I was thinking of contributing it myself TBH but it has become clear to me that this is not something that you (plural?) would appreciate
On Fri, Nov 4, 2022, 11:38 Paul Moore @.***> wrote:
We don't have the interest or bandwidth to maintain (or vendor) an ordered set implementation when the stdlib supplies data structures that do what we need. IMO this issue isn't significant enough to justify that sort of maintenance overhead.
— Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/11572#issuecomment-1303240090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWUWK6WK6WN2FVO3Y6RGDWGTRSJANCNFSM6AAAAAARWBOOAA . You are receiving this because you authored the thread.[image: Web Bug from https://github.com/notifications/beacon/AABWUWJ6T5WBZUDW6HUWTJ3WGTRSJA5CNFSM6AAAAAARWBOOACWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSNVXOZU.gif]Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": " https://github.com/pypa/pip/issues/11572#issuecomment-1303240090", "url": "https://github.com/pypa/pip/issues/11572#issuecomment-1303240090", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": " https://github.com" } } ]
It's my personal opinion, feel free to wait to see what the other pip maintainers think if you want. But it's not so much the initial contribution of the code that matters to me, it's the ongoing issue that we'd need to remember to not use sets anywhere in the codebase in future, just in case it results in nondeterministic behaviour. Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.
I just don't think the issue is serious enough to be worth working through all the implications involved in fixing it.
Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?
This already happens when choosing what package to backtrack on.
I'm sure then it would be possible to add a test that confirms this order is followed?
Agreed, something like that sounds far more plausible (although given that the reported issue was with the ordering of the "processing..." lines, I don't think that would help in the OP's case, as that would require changing the order of processing, not just the order of reporting).
Agreed my wording wasn't quite right, I meant whenever iterating through any collection that would affected the user.
With regards to the symptoms of this issue I have reproduced it myself just by running the same command several times:
python -m pip download -r requirements.txt -d downloads
It seems to me the top level packages are resolved in user order and the order of transitive dependencies is what can change. I'm trying to think of hypothetical situations where this could have significant user impact:
But I've never seen a user report of this so it seems like it isn't easily triggered if it is possible.
an ordered set implementation
In my naive mind any ordered dict could be used (e.g. storing None
for all keys), i.e. in Python versions relevant to pip
: a dict. but perhaps I'm missing something specific to pip
that makes this hard.
Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.
In my experience testing deterministic code is much simpler than non-deterministic code (precisely because tests can rely on various orderings) both for happy paths and for pinpointing bugs (which may disappear on the next run in non-deterministic code). And testing that the code behaves deterministically basically comes for free, because you're very likely to start relying on the deterministic behaviors in your tests expectations.
In fact, such automatic reliance on deterministic behavior is so automatic, that if there's any arguments to be made against deterministic behavior in the context of testing, they would be quite opposite to the one you just made. Namely: [1] that in practice tests will start relying so much on deterministic behaviors that are in fact implementation details, that this makes refactoring of such implementation details harder and [2] that the fact that your code behaves more uniformly may obscure some cases of failures that would be easier to detect in a randomly behaving code-base. Still, in my mind these disadvantages are minor in comparison to the advantage of your code always behaving in the same way. If these were strong arguments, one would introduce randomness in as many locations as possible, and that's not something we tend to do for obvious reasons.
Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?
If the iterating itself does not have side-effects, this would indeed be simpler. I do seem to remember, however, that in some cases iterators were specifically introduced in the pip code-base to defer side-effects, so I'm not 100% sure this is possible (or at least: easier)
Instead of getting back into an extensive discussion about the nature of determistic vs non-deterministic order in data structures... I reckon it'd be a better investment of effort to investigate the specific cause of this issue.
investigate the specific cause of this issue
If curiosity and spare time come together I might just do that
Description
Subsequent calls to
pip install
do not execute in the same order, even when wheels are vendored, and no index is used.This seems unnecessarily nondeterministic (to me), and makes it harder than necessary to reproduce bugs (including bugs in pip).
Expected behavior
No response
pip version
22.0.2
Python version
Ptyhon 3.10
OS
any (presumably); in practice: ubuntu
How to Reproduce
Note the positioning of
furl-2.1.3-py2.py3-none-any.whl
andidna-3.4-py3-none-any.whl
in the 2 subsequent runs.Output
No response
Code of Conduct