Closed Bodigrim closed 6 months ago
find
vector-hashtables: OK
3.91 ms ± 18 μs, 62% less than baseline
vector-hashtables (frozen): OK
2.44 ms ± 14 μs, 68% less than baseline
That's a really impressive result.
insert
vector-hashtables: OK
6.59 ms ± 115 μs, same as baseline
insert (resize)
vector-hashtables: OK
9.79 ms ± 110 μs, 11% less than baseline
That's probably because the new sequence of sizes grows a bit faster, so fewer resizes happened. Which isn't a bad thing.
@klapaucius @swamp-agr @ulysses4ever anything else you'd like me to do here? Or is it acceptable?
Impressive. Thank you :)
By choosing primes which allow for faster division, we can shave off some time.
The exact performance depends on the architecture. The worst possible case is
aarch64
, where (due to GHC shortcomings) we cannot benefit from faster division algorithms at all. Nevertheless, there are some modest gains, mostly because the divisor is now passed unboxed and remainders forced, so they are unboxedInt#
as well. Here are numbers on macOS M2:Running the same benchmark on
x86_64
demonstrates much more pronounced benefits, up to 3x faster: