Closed Eh2406 closed 6 months ago
Just cherry picking 976d03ed17ba64af56279b8844afe69892065383 onto our tagged branch speeds my test up from a good 4s to 1.7s-1.8s! We'll have to rebase onto the main to measure with the others speedups, once that is done i'll make a proper hyperfine benchmark and report back.
That sounds like a big win! Hopefully this can get a review and get merged soon.
When i cherry-pick 976d03ed17ba64af56279b8844afe69892065383 onto f64c499f83e5798a583f9850fac28d2e7bf1dfbf the speedup is lower but still there. For the tf-models-nightly
benchmark, which includes running through a lot of incompatible versions:
Benchmark 1: main
Time (mean ± σ): 5.235 s ± 0.084 s [User: 4.932 s, System: 0.295 s]
Range (min … max): 5.157 s … 5.417 s 10 runs
Benchmark 2: branch
Time (mean ± σ): 4.875 s ± 0.066 s [User: 4.578 s, System: 0.291 s]
Range (min … max): 4.795 s … 4.970 s 10 runs
Summary
branch ran
1.07 ± 0.02 times faster than main
That makes sense. When #163 applies, as it does in your case, it entirely removes the old incompatibilities from consideration. This PR keeps track of the fact that the old incompatibilities can be skipped, which is much less useful if they're not being considered at all. I'm glad to see that despite the redundancy this PR pulls its weight.
It is easy to imagine cases where #163 does not apply, where this PR could have a much larger impact. Closer to the one you originally measured.
This extends the work from #88. To quote:
Looking at our normal benchmarks this is still within the noise. Looking at
hobo
it is a 5% improvement. However it does reduce the number of times intersection is called, and our users are telling us that intersections in their code are more expensive than in our benchmarks.There are more efficient alternatives to calling
retain(|_, dl| *dl <= decision_level)
. (The insertion order is also sorted bydl
, so binary search and truncate is an option. Or ...) But I have not seen it in profiles so I don't think it is worth more complexity at this time.If this is a big win for our users then we should merge soon. Otherwise, this can stick around until other optimizations make the background noise smaller.