Closed ltratt closed 5 years ago
Annoyingly the tests are failing probably only because there isn't a rustfmt
for nightly yet...
Even if that latest commit does the trick, there's still going to be some squashing necessary...
I'm fine with the new benchmark, though this PR doesn't change its performance either way. [Which makes sense, I guess: you've only got a 1 in 256 chance of hitting a block with fully set bits so you'd expect a roughly 0.4% speed increase, which is within the measurement noise.]
Should I squash? [I'll leave your commit separate.]
Right, I hadn't fully thought the probabilities through... But I guess this PR is strongly optimising for the everything is T::max_value()
or 0
cases. Cases where those cases are interleaved in some weird way may be slightly different, but I'm a) not sure how realistic that is and b) too lazy to implement those benchmarks :stuck_out_tongue:
You could squash, as far as I'm concerned. Perhaps re-order the commits a bit as well.
Reordered and squashed. Note that I changed CHANGES.md
to have the new release be today (rather than yesterday, which make it out of sync with crates.io
).
The basic idea here is to deal with a common case which is that we have blocks with all bits set/unset: we can deal with these without doing anything more than a load and a compare. Rethinking the loop allows us to make this case as fast as possible; if we don't hit it, we fall back to the general case. This does make the code a bit harder to follow, but it handles the edge cases well: iterating over a Vob with iter_set_bits and all bits set is a bit over 3x faster with this change, with the general case statistically unchanged. Iterating over a Vob with iter_set_bits and all bits unset is about 40% faster, though this means it changes from "very fast" to "very very fast" (i.e. I doubt anyone will notice, given the absolute speed was high before).
Benchmarks from my machine before:
and after:
Notice that there is some variation in the iter_all_set_bits benchmark: sometimes it's much faster again (~65,000 ns/iter). I don't know why.
These numbers are, unsurprisingly, virtually identical for iter_unset_bits.
Since (assuming I haven't made any mistakes!), this is a simple change, this also seems like a good candidate for a new release.