sarugaku / resolvelib

Resolve abstract dependencies into concrete ones
ISC License
142 stars 31 forks source link

Store failure cause and provide to _get_preference #84

Closed notatallshaw closed 3 years ago

notatallshaw commented 3 years ago

As per https://github.com/pypa/pip/issues/10479

If pip prefers failure causes then resolvelib must provide them to get_preference

Required for https://github.com/pypa/pip/pull/10481

uranusjr commented 3 years ago

Saving the failure causes directly on the resolver feels wrong to me, since the cases are only specific to a backtracking session and not the entire resolution. I feel it makes more sense to put them on the State objects instead (or another stack that only manages backtracking, but I want to avoid that if possible).

notatallshaw commented 3 years ago

Saving the failure causes directly on the resolver feels wrong to me, since the cases are only specific to a backtracking session and not the entire resolution. I feel it makes more sense to put them on the State objects instead (or another stack that only manages backtracking, but I want to avoid that if possible).

I'd not thought about this approach. Taking a quick look at the code pushing new state seems to happen Pip doesn't have new failure causes, so I'm not quite sure how this would be achieved. I'll try and study the code more carefully and think on it.

uranusjr commented 3 years ago

Expanding on the thought, Criterion.incompatibilities is calculated from similar sources to failure_causes, so perhaps we should try to combine the two instead and pass whatever is exposed on Criterion to Provider.get_preferences instead. This would also make the signature somehow lines up better with Provider.find_matches; them being different has always felt weird to me.

uranusjr commented 3 years ago

Some more context on why I feel the attribute should be on State (either directly or on criterion objects on it) that may help putting things into context.

A resolver is fundamentally traversing deeper and deeper into a graph to find a working solution. This can be most naively implemented by recursion, but you know, Python does not do that particularly well, so the resolver is, at the highest level, unrolling the recursion into a loop, and states is the stack that holds context for each iteration that’s done by call frames in recursion. One recursion is one interation in the main loop, and its call frame is created by pushing a State object.

With recuision, the easiest way to implement backtracking is with exceptions and re-raising, which bubbles up through the call chain until we find a working call site. Mapping this to iteration, we’d keep popping states, modifying the previous state a bit (the except block), and if that still does not work, pop one more frame (re-raise). Which is why I feel putting failure_causes on a state feels more natural; the approach you’re taking right now is like introducing global state to functions (self._failure_causes is not managed by any call frame and is overwritten on every assignment), which works, but should be avoidable for what we’re trying to achieve, and should be avoided if practical.

notatallshaw commented 3 years ago

Which is why I feel putting failure_causes on a state feels more natural; the approach you’re taking right now is like introducing global state to functions (self._failure_causes is not managed by any call frame and is overwritten on every assignment), which works, but should be avoidable for what we’re trying to achieve, and should be avoided if practical.

This was very helpful thanks!

Any solution that involved pinning the failure causes to the state would end up having the same effect as option 1 as I mentioned here: https://github.com/sarugaku/resolvelib/pull/84#discussion_r712414959 . So first I am going to test if this approach still works in the test cases I have.

It seems also that the simplest way to do this is to have Resolution._attempt_to_pin_criterion instead of returning causes pin thems to Resolution.state, and have State have a new property "failure_causes"? Would this be a good approach to attempt or are there some mechanics of resolvelib elsewhere that I need to understand / modify instead?

uranusjr commented 3 years ago

It seems also that the simplest way to do this is to have Resolution._attempt_to_pin_criterion instead of returning causes pin thems to Resolution.state, and have State have a new property "failure_causes"? Would this be a good approach to attempt or are there some mechanics of resolvelib elsewhere that I need to understand / modify instead?

Yeah this would be the easiest approach, but maybe we can redesign Criterion.incompatibilities for this. This is a list of known incompatible candidates, I’m wondering we can turn it into a list of kind of wrapping with more information (similar to how Criterion.information is a list of wrappers to the criterion’s “parent” candidates) to achieve the same purpose with less duplication.

notatallshaw commented 3 years ago

Yeah this would be the easiest approach, but maybe we can redesign Criterion.incompatibilities for this. This is a list of known incompatible candidates, I’m wondering we can turn it into a list of kind of wrapping with more information (similar to how Criterion.information is a list of wrappers to the criterion’s “parent” candidates) to achieve the same purpose with less duplication.

Okay, I took a look at this and I need a better understanding of how Criterion.incompatibilities work, I'll do some further research.

In the mean time I was able to move failure_causes off being a global variable and on to the state, this slightly changes the behavior of when the failure_causes preferences are applied so I am running over the requirements files I have now to make sure it still solves excessive backtracking. I'll update the PR once done.

After that, I think, migrating to using Criterion.incompatibilities should just be a refactoring and not affect the behavior of backtracking.

notatallshaw commented 3 years ago

I've not updated the causes to be stored on the state instead of the Resolver instance, and I am calling them "backtrack_causes" instead of "failure_causes". This is because they are not actually the failures of the current state, but instead failures of a state which needed to be backtracked out of.

I personally feel it makes everything read better, especially in the Provider, But let me know if anyone was particularly tied to "failure_causes"

notatallshaw commented 3 years ago

FYI, I consider this PR ready, if maintainers agree I'll start fixing tests.

I took a look at merging the logic with incompatibilities but I think there's a timing issue. The failure causes are generated from a state which is deleted by backtracking. Incompatibilities are generated from the previous state.

I have to save the failure causes before that state gets deleted then pin them to the latest state that exists after backtracking.

But let me know if you're still unhappy with this approach.

uranusjr commented 3 years ago

Please fix the tests then. You can do this in a separate commit, so if we want to switch direction the tests can easily be un-changed with git revert.

notatallshaw commented 3 years ago

Please fix the tests then. You can do this in a separate commit, so if we want to switch direction the tests can easily be un-changed with git revert.

I pushed a commit that fixes the tests, turned out to be fairly trivial fix.

But the tests fail on Python 2.7 because list does not have a copy method. This is an easy fix but should I fix it? Does resolvelib support Python 2.7 still?

uranusjr commented 3 years ago

Yes 2.7 is still supported. You can use foo[:] instead, this is compatible to all supported Python versions.

notatallshaw commented 3 years ago

Yes 2.7 is still supported. You can use foo[:] instead, this is compatible to all supported Python versions.

Yup, already submitted a commit with that and and all tests pass.

notatallshaw commented 3 years ago

LGTM.

I do think we should add tests for the new behaviour though, to ensure that we’re passing valid things with a valid structure in the new argument.

I took a look at the existing tests yesterday and I couldn't find anything that tests any of the behavior here. This PR doesn't actually change the behavior of resolvelib it just gives more information for downstream library so it in turn can choose to change the ordering of the resolution.

The main breaking change here is passing a new argument to Resolution._p._get_preference, but there don't seem to be any tests on what Resolution._p is or has to provide?

It also adds an extra property to State which is a named tuple, so if any code iterated over State instead of accessed it via property names that would break. But as best as I can tell State is used only internally to resolvelib and it is never iterated?

pradyunsg commented 3 years ago

it just gives more information for downstream library so it in turn can choose to change the ordering of the resolution.

Yup yup, this is what I'm saying we should add a test for.

None the less, I also thought we had more extensive tests for this library than we actually do, so I'm happy to kick the can down the road for this. :)

pradyunsg commented 3 years ago

Hurrah. @uranusjr do you wanna cut the release, or do you think it might make sense for me to learn how to do that on this project? :)

notatallshaw commented 3 years ago

it just gives more information for downstream library so it in turn can choose to change the ordering of the resolution.

Yup yup, this is what I'm saying we should add a test for.

None the less, I also thought we had more extensive tests for this library than we actually do, so I'm happy to kick the can down the road for this. :)

I had a think about this and thought maybe provider should provide a Base or Abstract class so this can be checked / tested.

Then I looked at the code and realized there is an AbstractProvider and this method signature needs to be updated otherwise when Pip vendors this tests will break on Pips side: https://github.com/sarugaku/resolvelib/blob/master/src/resolvelib/providers.py#L12

How do I proceed fixing this PR post-merge? Make a new PR?

Also would it be possible to give Resolution._p type AbstractProvider so it can be type checked? I'm not familiar with pyi files and what gets included in them.

notatallshaw commented 3 years ago

Created new Pull Request https://github.com/sarugaku/resolvelib/pull/86 due to missing AbstractProvider.

uranusjr commented 3 years ago

@pradyunsg Feel free try your hands, I can always take over if things go south. There's a tox task to automate release, but to be honest I never remember its exact arguments and always ended up reading the source and manually invoked all the steps instead, it's actually easier than worrying about whether automation is doing the right thing.

notatallshaw commented 3 years ago

@pradyunsg Feel free try your hands, I can always take over if things go south. There's a tox task to automate release, but to be honest I never remember its exact arguments and always ended up reading the source and manually invoked all the steps instead, it's actually easier than worrying about whether automation is doing the right thing.

What are people's thoughts about making it for the pip 21.3 release? Obviously I would of preferred that, but if it can't it can't.

pradyunsg commented 3 years ago

@pradyunsg Feel free try your hands, I can always take over if things go south.

Cool, that worked out. 0.8.0 is up.

Here's the release process, which I'll note down while it's still fresh in my head:

pfmoore commented 3 years ago

That's a nice clean release process 🙂 Well done to whoever implemented the automation around that!

uranusjr commented 3 years ago

I added the bullets to DEVELOPMENT.rst, thanks for verifying the process works :p