Closed ecstatic-morse closed 3 years ago
Also, we still use the standard library's binary search implementation in some places, which doesn't have the second (admittedly less important) optimization. If rust-lang/rust#74024 lands, we'll definitely want to switch to a custom impl in every case, since it will double the number of comparisons in the inner loop for little benefit.
You might want to r?
somebody to get their attention.
We have talked about this PR on zulip, hopefully niko has some time for this (if not we will look at this at the next sprint).
I don't think bors is active on this repo, so let's ask manually via github 🙏
Looks good to me!
We spend about 30% of cycles doing binary search in
ExtendWith::count
while runningclap-rs/app-parser-{{impl}}-add_defaults
. This indicates to me that we need to explore different storage for thecfg_edge
relation, but a maximally optimalbinary_search
is beneficial regardless.This PR switches to
get_unchecked
to eliminate a bounds check that the optimizer cannot. This is also done in the standard library implementation.It also takes advantage of a lesser known invariant of
Vec
, that the capacity cannot exceedisize::MAX
bytes, to compute the midpoint using less instructions (there's a fun article from the early internet about this).As a result, the aforementioned test case on a Ryzen 4700U laptop on a goes from this:
to this:
Wall time is highly variable and may differ on your platform.
cc @shepmaster (since they seem to be interested in this sort of thing)