Resolving depends on the order of the root requirements in some cases

pfmoore commented 4 years ago

The following test fails.

def test_requirements_different_candidate_sets():
    requirements = {
        "r1": ["c1"],
        "r2": ["c2"],
    }

    class Provider(AbstractProvider):
        def identify(self, d):
            return "r" if d.startswith("r") else d

        def get_preference(self, *_):
            return 0

        def get_dependencies(self, _):
            return []

        def find_matches(self, r):
            return requirements.get(r, [])

        def is_satisfied_by(self, r, c):
            if r == "r1":
                return c in requirements[r]
            elif r == "r2":
                # r2 accepts anything, even stuff it doesn't
                # return in find_matches
                return True
            return False

    resolver = Resolver(Provider(), BaseReporter())
    result = resolver.resolve(["r1", "r2"])
    assert result.mapping["r"] == "c1"

    resolver = Resolver(Provider(), BaseReporter())
    result = resolver.resolve(["r2", "r1"])
    assert result.mapping["r"] == "c1"

Note that the two tests at the end are the same request, just with the order of the requirements reversed.

The requirements have been constructed so that:

They are both for the same "project".
r1 is a normal sort of requirement.
r2 accepts the candidate r1 returns as well as its own one, but doesn't return that candidate in find_matches.

This specific configuration came up in https://github.com/pypa/pip/pull/8136.

I don't think the result of a resolve() call should be order-dependent like this, even if we argue that the provider is pathological. If we don't want to support providers like this, we need to document precisely what rules the provider has to follow - and we should validate them and produce a proper error, not just return an incorrect result.

(Personally, I'd like to get this case to work, as it's far easier to implement pip's constraints if we can).

pfmoore commented 4 years ago

Trying to express the rule that providers must follow, that makes this provider "pathological":

A provider's is_satisfied_by method shouldn't return True for any candidates that would not have been returned by a call to get_matches for the same requirement.

The reason for this constraint is that otherwise, the following operation isn't commutative:

def intersection(r1, r2):
    [c for c in resolver.get_matches(r1) if resolver.is_satisfied_by(r2, c)]

(We could simply require that intersection() is commutative, but that may be too technical for people implementing providers in practice to reason about).

Without this rule, the loop here will return different results depending on the order of the root requirements.

Sadly, pip's constraint requirement implementation doesn't satisfy that rule.

Note that we don't have to require this rule is satisfied. We could instead document that find_matches is only ever called on the first requirement with a given identity that is encountered (i.e., earliest in the root requirements set, or first appearance as a dependency).

uranusjr commented 4 years ago

One way to make the commutative property obvious would be to redesign the provider interface into find_all_matches(identifier) and is_satisfied_by(candidate, requirement). This way A & B would be

def intersection(r1, r2):
    identifier = provider.identify(r1)
    assert identifier == provider.identify(r2)
    return [
        c for c in provider.find_all_matches(identifier)
        if provider.is_satisfied_by(r1, c) and provider.is_satisfied_by(r2, c)
    ]

This can still return wrong results if provider is stateful (order of calling is_satisfied_by() affects the return value), but I think it’s much more reasonable to expect people to not do that.

I remember you mentioned earlier in Zulip this would make constraints more difficult to implement. Is it still the case if this problem is taken into consideration?

pfmoore commented 4 years ago

I remember you mentioned earlier in Zulip this would make constraints more difficult to implement. Is it still the case if this problem is taken into consideration?

It completely breaks the current implementation, but I've come to the conclusion that it's broken no matter what we do (and I have a fair idea of how I can implement constraints in a different way without the issue, at a hopefully-slight performance cost). So I'm OK with this change now.

uranusjr commented 4 years ago

Great, I’ll work on this tomorrow (including a new resolvelib release & vendoring update if everything goes as planned).

uranusjr commented 4 years ago

Note for self: This is probably aa good time to switch to the iterator-based best-at-front interface we talked about previously.

sarugaku / resolvelib

Resolving depends on the order of the root requirements in some cases #48