sarugaku / resolvelib

Resolve abstract dependencies into concrete ones
ISC License
138 stars 31 forks source link

For n packages there are O(n^2) calls to `_is_current_pin_satisfying` #147

Open notatallshaw opened 8 months ago

notatallshaw commented 8 months ago

As the number of packages to resolve increases there appears to be an O(n2) calls to _is_current_pin_satisfying and I think, at least in some circumstances, it's possible to optimize this away.

I am using steps to reproduce from https://github.com/pypa/pip/issues/12320 and the branch from this PR to avoid network calls in the call graph https://github.com/pypa/pip/pull/12327 and not have O(n2) calls to Pip collecting packages from local directory. In the future I plan to be able to show this performance issue using https://github.com/pradyunsg/pip-resolver-benchmarks

In this example there are ~1300 packages to install and _is_current_pin_satisfying is called ~2.5 million times, I generated the call graph using cProfile and gprof2dot:

output

I have not yet investigated how one might optimize this.

This is motivated from a real world usage report: https://github.com/pypa/pip/issues/12314.

notatallshaw commented 8 months ago

An idea I will investigate if no one else pipes in, but maybe it's possible to keep track of satified and unsatisfied names in the state, rather than needing to recalculate them each time.

notatallshaw commented 8 months ago

An initial observation, unsatisfied_names:

unsatisfied_names = [
    key
    for key, criterion in self.state.criteria.items()
    if not self._is_current_pin_satisfying(key, criterion)
]

Is almost always equal to:

self.state.criteria.keys() - self.state.mapping.keys()

Very rarely there is an additional name due to this part of _is_current_pin_satisfying:

all(
    self._p.is_satisfied_by(requirement=r, candidate=current_pin)
    for r in criterion.iter_requirement()
 )

And this is what is the expensive part of the call is, but possible a trick that could be applied is just assume unsatisfied_names = self.state.criteria.keys() - self.state.mapping.keys(), and at completion this approximate unsatisfied names will equal the real unsatisfied names (i.e. it will be 0).

notatallshaw commented 8 months ago

FYI I have not made any meaningful progress on this and doesn't expect to any time soon.

I tried avoiding calling _is_current_pin_satisfying in most cases of calculating unsatisfied_names , but I got lots of test failures that I beleive to be valid, it feels like it should be possible so possibly there was just an unidentified bug in my code. I will take a look at this again when I have a chance.

notatallshaw commented 7 months ago

FYI, a workaround is for the provider to cache the calls, a PR has been raised on Pip side to do that: https://github.com/pypa/pip/pull/12453