Open notatallshaw opened 8 months ago
An idea I will investigate if no one else pipes in, but maybe it's possible to keep track of satified and unsatisfied names in the state, rather than needing to recalculate them each time.
An initial observation, unsatisfied_names:
unsatisfied_names = [
key
for key, criterion in self.state.criteria.items()
if not self._is_current_pin_satisfying(key, criterion)
]
Is almost always equal to:
self.state.criteria.keys() - self.state.mapping.keys()
Very rarely there is an additional name due to this part of _is_current_pin_satisfying
:
all(
self._p.is_satisfied_by(requirement=r, candidate=current_pin)
for r in criterion.iter_requirement()
)
And this is what is the expensive part of the call is, but possible a trick that could be applied is just assume unsatisfied_names = self.state.criteria.keys() - self.state.mapping.keys()
, and at completion this approximate unsatisfied names will equal the real unsatisfied names (i.e. it will be 0).
FYI I have not made any meaningful progress on this and doesn't expect to any time soon.
I tried avoiding calling _is_current_pin_satisfying
in most cases of calculating unsatisfied_names
, but I got lots of test failures that I beleive to be valid, it feels like it should be possible so possibly there was just an unidentified bug in my code. I will take a look at this again when I have a chance.
FYI, a workaround is for the provider to cache the calls, a PR has been raised on Pip side to do that: https://github.com/pypa/pip/pull/12453
As the number of packages to resolve increases there appears to be an O(n2) calls to
_is_current_pin_satisfying
and I think, at least in some circumstances, it's possible to optimize this away.I am using steps to reproduce from https://github.com/pypa/pip/issues/12320 and the branch from this PR to avoid network calls in the call graph https://github.com/pypa/pip/pull/12327 and not have O(n2) calls to Pip collecting packages from local directory. In the future I plan to be able to show this performance issue using https://github.com/pradyunsg/pip-resolver-benchmarks
In this example there are ~1300 packages to install and
_is_current_pin_satisfying
is called ~2.5 million times, I generated the call graph using cProfile and gprof2dot:I have not yet investigated how one might optimize this.
This is motivated from a real world usage report: https://github.com/pypa/pip/issues/12314.