Open notatallshaw opened 7 months ago
Two points here.
I don’t see anything in the linked resolvelib discussion suggesting that they are waiting for “buy-in from pip maintainers”. But if it helps, I’m happy to say we should use the method if it’s implemented in resolvelib.
Pip is the main driver of the resolvelib. And this tagently related issue (different problem, possibly same solution), has been waiting on feedback from Pip maintainers: https://github.com/sarugaku/resolvelib/issues/134. So I don't see how this would be any different.
This isn’t about naming, but I don’t know what it means for a name to be “unsatisfied”. I assume the resolvelib PR will be updated to document the meaning of this term before it’s approved.
I would gladly take another suggestion on a name, it's simply a reference to the existing variable unsatisfied_names
: https://github.com/sarugaku/resolvelib/blob/main/src/resolvelib/resolvers.py#L422 Which is the name of all dependencies which do not yet have a satisfying solution.
Happy to fill out the docstring to explain that in more detail.
And this tagently related issue (different problem, possibly same solution), has been waiting on feedback from Pip maintainers
I added a comment on that issue.
As I said, it’s not about names. What I want is a user-facing explanation of the term “unsatisfied”. Something that will let future pip maintainers review and maintain this function without needing to read and understand the resolvelib code.
I have to say, I’m struggling to understand why this is so hard to get across. The whole point of having a separate library for the resolver is so that pip only needs to interact with a clear, documented API. I’m not being obtuse here -I genuinely do not know what it means for a name to be “unsatisfied”.
Which is the name of all dependencies which do not yet have a satisfying solution.
So, what? A provider implementation’s filter_unsatisfied_names
method will be called during the resolve by resolvelib, and it’s passed a list of names for which the algorithm hasn’t yet found a solution. The provider should return a filtered copy of this list. How? What is the client expected to achieve? What happens, for example, if I just return the first name? That’s filtered, isn’t it? If the client “gets it wrong” will the result be an incorrect solution?
The most critical thing I want to understand is what the risk is, if a future pip maintainer makes a change that impacts this function (either directly or indirectly) and gets it wrong. And following on from that, what information is available to minimise the risk that they get it wrong in the first place.
I added a comment on that issue.
Appreciate it!
I have to say, I’m struggling to understand why this is so hard to get across. The whole point of having a separate library for the resolver is so that pip only needs to interact with a clear, documented API. I’m not being obtuse here -I genuinely do not know what it means for a name to be “unsatisfied”.
I'm struggling to understand what you don't understand sorry, this is the same term used in the new resolver since it was made default in late 2020, and means the same thing, I picked that name because it was already a known term in the resolver codebase.
As I said, I'm more than happy if someone thinks there is a more descriptive name, I am not a fan of it, but I thought anyone who had to deal directly would resolution steps would at least understand it.
A provider implementation’s
filter_unsatisfied_names
method will be called during the resolve by resolvelib, and it’s passed a list of names for which the algorithm hasn’t yet found a solution. The provider should return a filtered copy of this list. How?
It doesn't need to be a copy, the provider can just return the list straight back and they get the same current behavior. I don't know what you mean by "How?", any way the provider wants to filter, or not filter, that list.
What is the client expected to achieve?
That the next step of the backtracking phase will only backtrack on those packages the client returned in the list.
What happens, for example, if I just return the first name?
Then the next step of the backtracking phase will only focus on that first name.
In real world performance for Pip it would affect the performance like so:
get_preference
only being called once per resolution step, instead of upto nThat’s filtered, isn’t it?
Yup, if that's how the client wants to focus the resolve that is completely valid. Honestly there may well be real world use cases for this approach, something about the client wanting to force resolution in the exact order they provided it without resolvelib checking get_preference on each name,
That’s filtered, isn’t it? If the client “gets it wrong” will the result be an incorrect solution?
As long as the client returns a non-empty subset of the name list (including upto the list itself), and as long as there are no bugs in resolvelib, and ignoring arbitary limits like ResolutionTooDeep
, then a correct solution will eventually be found.
The most critical thing I want to understand is what the risk is, if a future pip maintainer makes a change that impacts this function (either directly or indirectly) and gets it wrong. And following on from that, what information is available to minimise 1the risk that they get it wrong in the first place.
What's new is if they fail to honor returning a non-empty subset of the name list then resolvelib will throw an exception. Otherwise the pitfalls are exactly the same as get_preference
, it could cause resolution to take a potentially impractical amount of time to complete.
I'm struggling to understand what you don't understand sorry, this is the same term used in the new resolver since it was made default in late 2020, and means the same thing, I picked that name because it was already a known term in the resolver codebase.
That's partly my fault - I was focusing on the term "unsatisfied" when my confusion is actually more with the idea of "filtering". The rest of my comment was hopefully a little clearer. But having said that, my point here is that while it may be a known term in the resolver codebase (by which I presume you mean resolvelib) it's not well-known in the pip codebase (where, in fact, "satisfied" is typically used in the sense that a requirement is satisfied by an already-installed package). I'm trying to make sure that we don't end up in a situation where understanding the internal workings (and terminology) of resolvelib is a prerequisite for working on pip's resolution code. Maybe the issue is that because you are so familiar with the details of the resolution algorithm, you're not seeing the distinction I'm trying to make?
It doesn't need to be a copy, the provider can just return the list straight back and they get the same current behavior. I don't know what you mean by "How?", any way the provider wants to filter, or not filter, that list.
Let me be explicit then. If the list of unsatisfied names is ["foo", "bar", "baz"], how would pip know whether it's a good idea to remove "baz" from that list?
Again, this is a genuine question - I really have no idea how I'd write a useful filter_unsatisfied_names
method. It's similar to the problem with get_preference
[^1], but worse, because at least with "preference" there's an intuitive idea of what it means to "prefer" looking at a particular candidate.
As long as the client returns a non-empty subset of the name list (including upto the list itself), and as long as there are no bugs in resolvelib, and ignoring arbitary limits like ResolutionTooDeep, then a correct solution will eventually be found.
Thank you. That information is important and should very definitely be made clear in the resolvelib documentation. It means that filter_unsatisfied_names
is purely an optimisation, and will never change the result of the resolve()
call[^2]. Until now, I wasn't aware that this was the case.
But if it's not allowed to return an empty subset, that should be explicitly noted in the documentation, as well. And I hope it will trigger an immediate exception in resolvelib, rather than just doing the wrong thing because the data is bad.
[^1]: Which is also, IMO, not explained well enough in the resolvelib docs.
[^2]: Excepting, as you say, ResolutionTooDeep
limits and bugs.
Maybe the issue is that because you are so familiar with the details of the resolution algorithm, you're not seeing the distinction I'm trying to make?
Almost certainly this is contributing the less than ideal communication, but this discussion has been very fruitful I think, it's given me specific updates I can go back and apply to the resolvelib PR so hopefully it will be easier for Pip to maintain interaction with this API in the future.
Let me be explicit then. If the list of unsatisfied names is ["foo", "bar", "baz"], how would pip know whether it's a good idea to remove "baz" from that list?
I think the problem I have answering this question is because the genuine answer is: using a deep understanding of Python packaging, it's ecosystem, the algorithm type resolvelib employs, and SAT-algorithms.
The API I propose in the PR is almost identical to get_preference
, it provides the names and a bunch of objects that reflect the current state of the resolver algorithm, and in turn those objects are themselves derived from the objects that Pip provided to resolvelib to resolve. So the answers are the same as to the question "with get_preference
how would Pip know if it's a good idea to not prefer baz
?".
So as examples:
Again, this is a genuine question - I really have no idea how I'd write a useful
filter_unsatisfied_names
method. It's similar to the problem withget_preference
1, but worse, because at least with "preference" there's an intuitive idea of what it means to "prefer" looking at a particular candidate.
Agreed the name isn't intuitively unhelpful, any suggestions would be welcome: refine_resolution
? narrow_backtrack_candidates
?
Thanks. I guess my reservation is on the pip side then, as while I see that this can help a lot with resolution issues, it’s going to be critical that pip’s definition of this method is maintainable (without needing the sort of deep knowledge you possess - the pip code base is already far too inaccessible to new contributors). Let’s wait until there is a pip PR before discussing that in detail[^1].
[^1]: But for comparison, pip’s implementation of get_preference
is a good example of a clear implementation that can be understood and maintained without in-depth resolver knowledge.
As to the name, narrow_backtrack_candidates
is my preference out of the ones you suggested, but it’s only a mild preference. The real key is going to be having good documentation on the resolvelib side.
Let’s wait until there is a pip PR before discussing that in detail1
The PR exists, https://github.com/pypa/pip/pull/12499, I will update it and the resolvelib PR some time over the next week with the feedback from here.
I raise this PR before the resolvelib one is merged, released, and vendored exactly to iron out any issues with Pip (as the main customer of resolvelib) before I push for the resolvelib one to be merged. That PR includes 3 commits: 1) Add new API with no behavior change, 2) Move some existing optimizations over to new API to get speed ups, 3) Add new optimizations (as outlined by https://github.com/pypa/pip/issues/12498) to get massive speed up in complex backtracking cases.
Ah, OK. I'm afraid I don't really follow the logic in the new provider method in that PR. I'll try to take a proper look another day but my first reactions are:
filter_unsatisfied_names
method is documented in terms of "prefer backtracking on..." which is jarring to me. I'd much rather the docstring and comments be worded in terms of what the code is filtering out (and why it's doing so) and not leave it to the reader to try to work out the link from backtracking to filtering.PreferenceInformation
data, which is in need of documentation. I'm going to hope for now that the resolvelib PR will address this, and properly review those functions in light of whatever additional documentation gets added.Beyond that, I'll wait till I have some more time to do a proper review, rather than just a "quickly dash something off while I'm making food" disjointed ramble 🙂
Yeah, based on the feedback here I'm going to take quite a bit of time updating the docstrings
This API is now in resolvelib main, this issue can be closed once resolvelib is released and vendored.
Once it is vendored I plan to break out the 3 different optimizations I have created in https://github.com/pypa/pip/pull/12499 into their own PRs that I will post one at a time. In particular because the first one, moving prioritising backtrack causes from get_preference
to narrow_requirement_selection
should be relatively uncontroversial, whereas the other optimizations will require more justification.
What's the problem this feature will solve?
Pip's dependency resolution process, particularly in complex backtracking scenarios, currently faces performance issues due to repetitive calculations, leading to O(n^2) complexity. This issue arises when checking if a "name" is part of a cause but also hinders the implementation of more complex checks, as they would slow down simple backtracking scenarios.
Describe the solution you'd like
The solution proposes adding a new abstract method,
filter_unsatisfied_names
, in resolvelib, to be implemented by Pip. This method is designed to work alongside the existingget_preference
method, refining the backtracking process during dependency resolution.filter_unsatisfied_names
pre-filters unsatisfied names before they are processed byget_preference
.get_preference
, which is called once for each name,filter_unsatisfied_names
is called only once per backtracking round.filter_unsatisfied_names
, in contrast toget_preference
, which receives one name at a time.filter_unsatisfied_names
is able to short-circuiting checks.filter_unsatisfied_names
can determine if multiple names are important to backtrack on simultaneously.Alternative Solutions
get_preference
: Improvements to this method could offer some benefits, but it would struggle to enable complex backtracks without imposing performance penalties on simple backtracks.Additional context
I am not tied to the name
filter_unsatisfied_names
or even to this specific approach, as long as it's possible to filter unsatisfied names more than one at a time.A PR is open on the resolvelib side, but it is unlikely to ever be merged without buy-in from Pip maintainers: https://github.com/sarugaku/resolvelib/pull/145
Code of Conduct