Faster iteration - Githubissues

ltratt commented 5 years ago

The basic idea here is to deal with a common case which is that we have blocks with all bits set/unset: we can deal with these without doing anything more than a load and a compare. Rethinking the loop allows us to make this case as fast as possible; if we don't hit it, we fall back to the general case. This does make the code a bit harder to follow, but it handles the edge cases well: iterating over a Vob with iter_set_bits and all bits set is a bit over 3x faster with this change, with the general case statistically unchanged. Iterating over a Vob with iter_set_bits and all bits unset is about 40% faster, though this means it changes from "very fast" to "very very fast" (i.e. I doubt anyone will notice, given the absolute speed was high before).

Benchmarks from my machine before:

  test and                    ... bench:         429 ns/iter (+/- 9)
  test empty                  ... bench:         131 ns/iter (+/- 1)
  test extend                 ... bench:     338,070 ns/iter (+/- 3,002)
  test extend_vob_aligned     ... bench:       1,981 ns/iter (+/- 34)
  test extend_vob_not_aligned ... bench:       3,563 ns/iter (+/- 65)
  test from_bytes             ... bench:       1,900 ns/iter (+/- 54)
  test iter_all_set_bits      ... bench:     303,086 ns/iter (+/- 10,734)
  test iter_all_unset_bits    ... bench:       1,423 ns/iter (+/- 43)
  test iter_set_bits          ... bench:     159,042 ns/iter (+/- 3,711)
  test or                     ... bench:         429 ns/iter (+/- 4)
  test split_off              ... bench:       2,238 ns/iter (+/- 25)
  test xor                    ... bench:         429 ns/iter (+/- 5)

and after:

  test and                    ... bench:         429 ns/iter (+/- 4)
  test empty                  ... bench:         123 ns/iter (+/- 0)
  test extend                 ... bench:     338,047 ns/iter (+/- 3,640)
  test extend_vob_aligned     ... bench:       1,984 ns/iter (+/- 26)
  test extend_vob_not_aligned ... bench:       3,532 ns/iter (+/- 80)
  test from_bytes             ... bench:       1,900 ns/iter (+/- 36)
  test iter_all_set_bits      ... bench:      90,804 ns/iter (+/- 1,389)
  test iter_all_unset_bits    ... bench:         895 ns/iter (+/- 9)
  test iter_set_bits          ... bench:     161,703 ns/iter (+/- 5,251)
  test or                     ... bench:         429 ns/iter (+/- 5)
  test split_off              ... bench:       2,288 ns/iter (+/- 27)
  test xor                    ... bench:         428 ns/iter (+/- 21)

Notice that there is some variation in the iter_all_set_bits benchmark: sometimes it's much faster again (~65,000 ns/iter). I don't know why.

These numbers are, unsurprisingly, virtually identical for iter_unset_bits.

Since (assuming I haven't made any mistakes!), this is a simple change, this also seems like a good candidate for a new release.

ltratt commented 5 years ago

Annoyingly the tests are failing probably only because there isn't a rustfmt for nightly yet...

ltratt commented 5 years ago

Even if that latest commit does the trick, there's still going to be some squashing necessary...

ltratt commented 5 years ago

I'm fine with the new benchmark, though this PR doesn't change its performance either way. [Which makes sense, I guess: you've only got a 1 in 256 chance of hitting a block with fully set bits so you'd expect a roughly 0.4% speed increase, which is within the measurement noise.]

Should I squash? [I'll leave your commit separate.]

thomwiggers commented 5 years ago

Right, I hadn't fully thought the probabilities through... But I guess this PR is strongly optimising for the everything is T::max_value() or 0 cases. Cases where those cases are interleaved in some weird way may be slightly different, but I'm a) not sure how realistic that is and b) too lazy to implement those benchmarks :stuck_out_tongue:

thomwiggers commented 5 years ago

You could squash, as far as I'm concerned. Perhaps re-order the commits a bit as well.

ltratt commented 5 years ago

Reordered and squashed. Note that I changed CHANGES.md to have the new release be today (rather than yesterday, which make it out of sync with crates.io).

softdevteam / vob

Faster iteration #36