sarugaku / resolvelib

Resolve abstract dependencies into concrete ones
ISC License
138 stars 31 forks source link

Option to only provide causes of conflict when calling get_preference #131

Open notatallshaw opened 1 year ago

notatallshaw commented 1 year ago

When https://github.com/sarugaku/resolvelib/pull/84 was implemented it gave the downstream library the information about whether something was the cause of a backtracking conflict or not.

However using this information is problematic from a performance perspective as it can create an O(n2) situation as the entire list of names is checked over and for each name the entire list of backtrack causes is checked over. Although in most cases n is small enough that it's negligible, it has created real world performance issues (https://github.com/pypa/pip/pull/10621).

There have been of a couple of unmerged PRs to attempt to fix this via either by doing some fancy caching logic in Pip or defining a formal object structure for what a "Cause" object should look like and then add cache on to that object.

However a much simpler solution is to take the implicit loop out of get_preference and when there is a conflict for resolvelib to only pass names to the downstream library that are part of the backtrack cause. It seems to be this information is sufficiently generic for a resolving library that it can be an option of the algorithm itself and not just a preference of the downstream library.

I will start to make a PR and see how it performs, I opened this issue now to see if there is any direct objections to this approach.

notatallshaw commented 1 year ago

Initial testing for this looks good, I consistently see a ~2.5 second save on running python -m pip download apache-airflow[all]==2.6.1 (with all downloads cached) or slightly above 1% of the total time, which might not sound like much but most of the time is being spent collecting requirements and building packages in isolated environments to extract the metadata so the fact the performance improvement is measurable and consistent is a big win here.

But actually I think there's an even bigger improvement from this approach, it allows to set a reporting event that lets the downstream library log the cause of backtracking, I think this will be very useful for users trying to debug why backtracking is happening in the first place.

I'll try and post a PR soon.

notatallshaw commented 8 months ago

My initial approach, https://github.com/sarugaku/resolvelib/pull/132, made incorrect assumptions about what information resolvelib could guarantee it has access to.

This would now be solved by https://github.com/sarugaku/resolvelib/pull/145, the provider would need to implement on their side which causes to filter out.