perf: invalidate less contradicted_incompatibilities

pubgrub-rs / pubgrub

PubGrub version solving algorithm implemented in Rust

https://pubgrub-rs.github.io/pubgrub/pubgrub/

Mozilla Public License 2.0

337 stars 30 forks source link

perf: invalidate less contradicted_incompatibilities #170

Closed Eh2406 closed 6 months ago

Eh2406 commented 6 months ago

This extends the work from #88. To quote:

It is possible to figure out witch IncompIds are still used after a backtrack, but so far nothing that is pulls its weight.

Looking at our normal benchmarks this is still within the noise. Looking at hobo it is a 5% improvement. However it does reduce the number of times intersection is called, and our users are telling us that intersections in their code are more expensive than in our benchmarks.

There are more efficient alternatives to calling retain(|_, dl| *dl <= decision_level). (The insertion order is also sorted by dl, so binary search and truncate is an option. Or ...) But I have not seen it in profiles so I don't think it is worth more complexity at this time.

If this is a big win for our users then we should merge soon. Otherwise, this can stick around until other optimizations make the background noise smaller.

konstin commented 6 months ago

Just cherry picking 976d03ed17ba64af56279b8844afe69892065383 onto our tagged branch speeds my test up from a good 4s to 1.7s-1.8s! We'll have to rebase onto the main to measure with the others speedups, once that is done i'll make a proper hyperfine benchmark and report back.

Eh2406 commented 6 months ago

That sounds like a big win! Hopefully this can get a review and get merged soon.

konstin commented 6 months ago

When i cherry-pick 976d03ed17ba64af56279b8844afe69892065383 onto f64c499f83e5798a583f9850fac28d2e7bf1dfbf the speedup is lower but still there. For the tf-models-nightly benchmark, which includes running through a lot of incompatible versions:

Benchmark 1: main
  Time (mean ± σ):      5.235 s ±  0.084 s    [User: 4.932 s, System: 0.295 s]
  Range (min … max):    5.157 s …  5.417 s    10 runs

Benchmark 2: branch
  Time (mean ± σ):      4.875 s ±  0.066 s    [User: 4.578 s, System: 0.291 s]
  Range (min … max):    4.795 s …  4.970 s    10 runs

Summary
  branch ran
    1.07 ± 0.02 times faster than main

Eh2406 commented 6 months ago

That makes sense. When #163 applies, as it does in your case, it entirely removes the old incompatibilities from consideration. This PR keeps track of the fact that the old incompatibilities can be skipped, which is much less useful if they're not being considered at all. I'm glad to see that despite the redundancy this PR pulls its weight.

It is easy to imagine cases where #163 does not apply, where this PR could have a much larger impact. Closer to the one you originally measured.