Build a benchmark based on UV slow case

Eh2406 commented 8 months ago

Our production user is hitting slow cases in ways that are hard for us to optimize separately. We should have clear instructions for how to reproduce the slow cases they are hitting, and also similarly simulacrums of them that can be run with less infrastructure. cc #177

konstin commented 8 months ago

How to use uv, a python dependency resolver, as benchmark. The example we are using, bio_embeddings[all] on python 3.12 is pathological case that has to try 100k versions. It

Install python 3.12
git clone https://github.com/astral-sh/pubgrub (using the default branch, main)
git clone https://github.com/astral-sh/uv
Edit the top level Cargo.toml in uv to point to the pubgrub checkout
cargo build --profile profiling && mv target/profiling/uv main-uv
Edit pubgrub, e.g. cherry-pick your commits. Remember that we have downstream patches, so dev will fail to build uv.
cargo build --profile profiling && mv target/profiling/uv branch-uv
Run the hyperfine command, e.g. taskset -c 0 hyperfine --warmup 1 "./main-uv pip compile -p 3.12 --exclude-newer 2024-03-01 scripts/requirements/bio_embeddings.in" "./branch-uv pip compile -p 3.12 --exclude-newer 2024-03-01 scripts/requirements/bio_embeddings.in" where bio_embeddings.in contains bio_embeddings[all]. You may or may not want to use the taskset, for me it helps by avoiding the efficiency cores and reducing variance. Also don't forget to set your cpu to performance mode, this is very important for variance on my machine.
The original report for #135 can be reproduced with scripts/requirements/boto3.in.
As baseline i use scripts/requirements/jupyter.in, where nearly always the first version fits, so it shouldn't change.
You can use the profiling build with samply, e.g. samply record --rate 5000 target/profiling/uv pip compile scripts/requirements/bio_embeddings.in

Eh2406 commented 8 months ago

After a lot of work, I have been able to reproduce your set up. Here is a profile I was able to collect https://share.firefox.dev/43ioU30

by hacking in a OfflineDependencyProvider<P, VS>, adding dependencies to it in add_incompatibility_from_dependencies, and serializing it when done I was able to get a RON file. By various dirty tricks (calling to_string on P, manually fixing prereleases and other such complications) I converted into something that can be run in this repo without additional dependencies. bio_embeddings_str_SemanticVersion.txt

Here is a profile of running it through our normal large case benchmark runner. https://share.firefox.dev/495a8hG Unfortunately it still looks quite different. I think one biggest difference is your incorporation of the RPITIT branch. Another safe difference is that large case attempts to resolve every version not just the root version. It therefore biases data toward the problems of small dependency trees which tend to be dominated by the process of reading in data. Alternatively it may have to do with the difference in representation or semantics between &str vs PubGrubPackage or SemanticVersion vs Version. Or it may have to do with not representing filtered out versions.

Definitely more investigation is required.

Eh2406 commented 8 months ago

I think one biggest difference is your incorporation of the RPITIT branch. Another safe difference is that large case attempts to resolve every version not just the root version. It therefore biases data toward the problems of small dependency trees which tend to be dominated by the process of reading in data.

I ran the new benchmark file on the RPITIT branch and I altered our large case to only resolve if p.to_string() == "root". The resulting profile looks the same. So the process of elimination continues.

Alternatively it may have to do with the difference in representation or semantics between &str vs PubGrubPackage or SemanticVersion vs Version.

String with Arc<SemanticVersion> still looks the same. So time to switch tactics from modifying PubGrub to get the ron filed cooperate to looking at uv.

Or it may have to do with not representing filtered out versions.

Using lots of dirty hacks I was able to instrument add_incompatibility as well as add_incompatibility_from_dependencies. The resulting file also needed to be cleaned up by hand. bio_embeddings_2_str_SemanticVersion.txt however the resulting profile still looks the same. https://share.firefox.dev/3Pso4LE

Alternatively it may have to do with the difference in representation or semantics between &str vs PubGrubPackage or SemanticVersion vs Version.

This time I took the output of my instrumented uv street without any hand fixes. bio_embeddings_3_str_pep440.txt and changed large case to use the pep440_rs crate (with the appropriate suffix on file). The resulting profile is not fundamentally different. https://share.firefox.dev/49XrNsF

konstin commented 8 months ago

I'm seeing pick_highest_priority_package is big in your profile. In uv, we pick priority by using the package we've seen first as first. When we visit root, we iterate over root's deps and give the first the highest priority, the second the second highest, etc. When we selected foo, we visit all deps of foo and also queue them to our priority mapping.

Eh2406 commented 8 months ago

That does make pick_highest_priority_package O(1), but it also pushes a lot of work into the guts of the resolver. So for example in this case with your FIFO .map(|priority| priority + 1) the run time is 7.62s. But if we picked FILO .map(|priority| !(priority + 1)) it takes 0.18s. Without a consistent and logical ordering for package arrival these are equally arbitrary. I'm sure there are other cases where FIFO beats FILO. (The ordering is not entirely arbitrary, they are ordered by when they were discovered which correlates with when they were first pre-fetched. So FIFO is a good default for not waiting on the network.)

The OfflineDependencyProvider uses "fewest number of versions matching the requirement". I talk about the history of this heuristic https://github.com/astral-sh/uv/issues/1398#issuecomment-1952635650

Eh2406 commented 2 months ago

In my work using PubGrub for Cargo packages I am now also seeing an extraordinary amount of time in pick_highest_priority_package. To the point where returning a constant 1 ends up being faster than a "count matches" heuristic, on my main benchmark. However, this is by making most simple cases faster and making the slowest cases much slower. The pathological cases that were taking ~1min now timeout. The semantics I really want are the ones from #252, I want to use a fast heuristic until we've backtracked and then switch to a more information dense heuristic once backtracking has occurred.

Should we expose that information to our user? And if so how? If not, should we weaken our contract with prioritize so that it gets called less before backtracking?

konstin commented 2 months ago

Fwiw we changed the prioritization in uv so that packages with more concrete specifiers get prioritized over less specific ones (https://github.com/astral-sh/uv/blob/981661c4af9f5ed1a9cccbef9f87fd7c7c8c8b7d/crates/uv-resolver/src/pubgrub/priority.rs)

pubgrub-rs / pubgrub

Build a benchmark based on UV slow case #191