Speed up division by bucket's size

Bodigrim commented 6 months ago

By choosing primes which allow for faster division, we can shave off some time.

The exact performance depends on the architecture. The worst possible case is aarch64, where (due to GHC shortcomings) we cannot benefit from faster division algorithms at all. Nevertheless, there are some modest gains, mostly because the divisor is now passed unboxed and remainders forced, so they are unboxed Int# as well. Here are numbers on macOS M2:

Comparison
  1000000
    insert
      vector-hashtables boxed:        OK
        35.6 ms ± 590 μs,  3% less than baseline
      vector-hashtables unboxed keys: OK
        25.2 ms ± 486 μs,       same as baseline
      vector-hashtables:              OK
        6.59 ms ± 115 μs,       same as baseline
    insert (resize)
      vector-hashtables boxed:        OK
        47.1 ms ± 480 μs, 10% less than baseline
      vector-hashtables unboxed keys: OK
        32.9 ms ± 498 μs, 10% less than baseline
      vector-hashtables:              OK
        9.79 ms ± 110 μs, 11% less than baseline
    insert, delete
      vector-hashtables:              OK
        10.4 ms ±  55 μs,  1% less than baseline
    find
      vector-hashtables:              OK
        4.28 ms ±  53 μs,  2% less than baseline
      vector-hashtables (frozen):     OK
        1.28 ms ±  15 μs, 14% less than baseline
    lookupIndex
      vector-hashtables:              OK
        2.40 ms ±  31 μs, 11% less than baseline
    fromList
      vector-hashtables: OK
        12.6 ms ±  75 μs,  7% less than baseline
    toList
      vector-hashtables:              OK
        56.7 ms ± 1.4 ms,       same as baseline

Running the same benchmark on x86_64 demonstrates much more pronounced benefits, up to 3x faster:

Comparison
  1000000
    insert
      vector-hashtables boxed:        OK
        44.8 ms ± 370 μs,  9% less than baseline
      vector-hashtables unboxed keys: OK
        32.2 ms ± 800 μs,  9% less than baseline
      vector-hashtables:              OK
        10.5 ms ±  55 μs, 26% less than baseline
    insert (resize)
      vector-hashtables boxed:        OK
        57.5 ms ± 223 μs, 18% less than baseline
      vector-hashtables unboxed keys: OK
        40.3 ms ± 295 μs, 24% less than baseline
      vector-hashtables:              OK
        16.2 ms ±  22 μs, 41% less than baseline
    insert, delete
      vector-hashtables:              OK
        16.2 ms ±  86 μs, 36% less than baseline
    find
      vector-hashtables:              OK
        3.91 ms ±  18 μs, 62% less than baseline
      vector-hashtables (frozen):     OK
        2.44 ms ±  14 μs, 68% less than baseline
    lookupIndex
      vector-hashtables:              OK
        3.78 ms ±  26 μs, 57% less than baseline
    fromList
      vector-hashtables: OK
        26.9 ms ± 157 μs, 14% less than baseline
    toList
      vector-hashtables:              OK
        61.7 ms ± 157 μs,  1% less than baseline

klapaucius commented 6 months ago

    find
      vector-hashtables:              OK
        3.91 ms ±  18 μs, 62% less than baseline
      vector-hashtables (frozen):     OK
        2.44 ms ±  14 μs, 68% less than baseline

That's a really impressive result.

    insert
      vector-hashtables:              OK
        6.59 ms ± 115 μs,       same as baseline
    insert (resize)
      vector-hashtables:              OK
        9.79 ms ± 110 μs, 11% less than baseline

That's probably because the new sequence of sizes grows a bit faster, so fewer resizes happened. Which isn't a bad thing.

Bodigrim commented 6 months ago

@klapaucius @swamp-agr @ulysses4ever anything else you'd like me to do here? Or is it acceptable?

swamp-agr commented 6 months ago

Impressive. Thank you :)

klapaucius / vector-hashtables

Speed up division by bucket's size #28