Open pfmoore opened 4 years ago
is it necessary that the same candidate is returned in both calls, or is it enough that "equivalent" candidates are returned?
The resolver does not compare candidates with each other, exactly because of the reason you raised: it does not (cannot) assume this is a sensical thing to do. So yes, it will result in duplicated work if equivalent (whatever this means) candidates are returned by find_matches()
.
I would incline to treat this as an optimisation problem; we be conservative right now and return some equivalent candidates if we’re not sure, and slowly figure out how to eliminate them. I also feel this would not be a very big problem in practice for pip, since PackageFinder
already eliminates a lot of the duplicates. The only source of duplication would be direct URL and local source dir, either is used very much currently AFAICT since the current legacy resolver does not handle them very well.
And, one (nice?) thing about the separation of concerns in this API design, is that the optimization can/should happen on the Provider side, which is best positioned to correctly identify and cache "equivalent" candidates.
Cool, I'm happy with that. But just to be clear, if I follow the logic in the code:
identify()
value (the reqirement's "name") has find_matches()
called for it.find_matches()
(maybe except if we backtrack, I never checked that code yet).So the question of "multiple copies of the same candidate" never even crops up in the resolution code.
IMO, at some point this should be added to the docs, as a clarification. But for now I'm happy to simply have this issue as a reference.
It's easy to lose track of this when writing Requirement
and Candidate
objects that have the provider methods delegated to them (like the pip prototype does at the moment). I'm wondering whether it was a mistake to do that. Cue rewrite number 20 of the pip integration code 😉
I'm honestly a little concerned with the delegating that we're doing in our implementation, since it feels like more refactoring work later to cleanup responsibilities. But, yea, it's not a major concern but more of a back of the head thought atm.
The specification of the provider's
find_matches
method doesn't include any information about whether candidates need to be "unique". To give an example, consider two requirements,pip >= 19.0
andpip >= 20.0
. The candidatepip-20.0-py3-none-any.whl
satisfies both of these.When a client implements
find_matches
on a provider, is it necessary that the same candidate is returned in both calls, or is it enough that "equivalent" candidates are returned? (To be honest, I'm not even clear what it means to be the "same" candidate here - is object identity enough?)Reasons this matters:
identify
could involve building the project to get the project name), we want to avoid doing this multiple times if it's not needed.I can look at the existing code to determine how things work, but this should be documented so that the implementation isn't constrained to keep internal details the same because clients rely on them.