Closed notatallshaw closed 2 weeks ago
Interestingly this seems to turn tests/test_resolvers.py::test_resolving_conflicts
into a flaky test.
I think the only behavioral change that should have happened is that get_preference
is not always called.
I will need to investigate if that's the issue or if it's something else.
I have fixed the flaky test.
The test relied on the order of resolution and this was broken by giving get_preference a set instead of a list. I got rid of keeping this set, as my code no longer relies on that anyway.
I have started to test the improvements using pip resolver branchmark: https://github.com/pradyunsg/pip-resolver-benchmarks
The default benachmark of pyrax==1.9.8, moderetly improves from 4.75 +- 0.01 to 4.62 +- 0.04 on my machine.
This benchmark is pretty interesting because it is a complicated resolve but doesn't benefit at all from choosing conflicting causes, so the extra checks for choosing conflicting causes in my experimental Pip branch contribute a little extra time (in fact this benchmark caused me to rework my checks slightly to be less likely to cause slowdowns).
The small performance improvement comes from deduplicating and filtering the causes ahead of calling get_preference
.
I'm not really a fan of the name filter_unsatisfied_names
, but I can't think of a better name right now.
Here is another example that produces ResolutionTooDeep
in the current version of Pip:
sphinx
sphinx-rtd-theme
sphinx-toolbox
myst-parser
sphinxcontrib-bibtex
nbsphinx
But can be solved by implementing this in resolvelib and adding logic on Pip side to prefer conflicting causes.
Let me know what would be needed to review and accept this, though I won't be available much until the new year, I will hopefully be pushing for it a bit harder early next year.
As resolvelib doesn't ever implement this method itself I wasn't sure what to add in terms of tests.
Working on the Pip implementation (https://github.com/pypa/pip/pull/12459), I realized it can be useful to have more than just backtrack_causes, so I have updated the filter_unsatisfied_names
to basically mirror the get_preference
arguments.
The Pip implementation now improves things to the extent that I can no longer reproduce the ResolutionImpossible
failures that backjumping causes. I will write about that more in the other issues/PRs.
I've updated this PR based on the conversation here: https://github.com/pypa/pip/issues/12497
Main changes are:
filter_unsatisfied_names
-> narrow_backtrack_selection
, still would like a better name, but I think it addresses the concerns raised in that threadget_preference
It occurs to me that resolvelib could validate that resolvelib could validate that the return value of self._p.narrow_requirement_selection
is a subset of unsatisfied_names
, this would guarantee that this method does not affect the soundness of the resolution, but at the cost of doing a subset check every backtrack iteration.
I am inclined to leave it to the documentation of how the client must implement narrow_requirement_selection
.
Okay, I've rebased, applied suggestions, updated doc strings to be more readable (at least to me), added a news entry, and extended Python functional tests to check with and without this method, and the test is a realistic use of this method.
Feeling pretty good about this now, looking for further feedback (or approval 🤞).
This allows the provider to optimize the steps it takes to choose what to backtrack on.
Specifically the aim is to add preferring conflicting causes in Pip without the overhead of recalculating or caching each time
get_preference
is called. I initially wanted to do a lot of this "preferring" work in resolvelib itself, but upon inspecting the resolution types I realized there was no guarantee of required information (which would have been a problem if this previous PR had landed https://github.com/sarugaku/resolvelib/pull/132).I have a Pip branch that implements this filtering: https://github.com/notatallshaw/pip/blob/prefer-conflicting-causes/src/pip/_internal/resolution/resolvelib/provider.py#L240. Which on Python 3.8 allows it to solve the following requirement which currently produces a "ResolutionTooDeep" error:
--only-binary ":all:" "numpy==1.21.6" "cython==0.29.28" "scipy>=1.4.0" "torch>=1.7" "torchaudio" "soundfile" "librosa==0.10.0.*" "numba==0.55.1" "inflect==5.6.0" "tqdm" "anyascii" "pyyaml" "fsspec>=2021.04.0" "aiohttp" "packaging" "flask" "pysbd" "pandas" "matplotlib" "trainer==0.0.20" "coqpit>=0.0.16" "pypinyin" "mecab-python3==1.0.5" "jamo" "bangla==0.0.2" "k_diffusion" "einops" "transformers"