spaolacci / murmur3

Native MurmurHash3 Go implementation
BSD 3-Clause "New" or "Revised" License
947 stars 127 forks source link

Remove extra reference to `digest128` #23

Closed smira closed 5 years ago

smira commented 5 years ago

This extra reference isn't really required as this object should be anyway allocated on the stack.

I wasn't able to reproduce it with micro-benchmarks for murmur3, but with bigger library I'm working on this leads to one extra memory allocation when slice passed to Sum128() is taken from byte array.

Performance improvement is small, but noticeable for smaller slices (that's for Benchmark128):

name        old time/op    new time/op    delta
128/1-8       29.6ns ± 1%    25.0ns ± 1%  -15.42%  (p=0.008 n=5+5)
128/2-8       30.0ns ± 2%    25.9ns ± 2%  -13.66%  (p=0.008 n=5+5)
128/4-8       30.5ns ± 2%    27.5ns ± 2%   -9.83%  (p=0.008 n=5+5)
128/8-8       31.7ns ± 1%    28.8ns ± 2%   -8.97%  (p=0.008 n=5+5)
128/16-8      28.9ns ± 1%    26.0ns ± 1%  -10.10%  (p=0.008 n=5+5)
128/32-8      32.3ns ± 2%    28.9ns ± 1%  -10.64%  (p=0.008 n=5+5)
128/64-8      39.2ns ± 1%    35.0ns ± 1%  -10.66%  (p=0.008 n=5+5)
128/128-8     51.3ns ± 2%    47.6ns ± 1%   -7.14%  (p=0.008 n=5+5)
128/256-8     75.0ns ± 3%    72.3ns ± 2%   -3.55%  (p=0.016 n=5+5)
128/512-8      127ns ± 2%     125ns ± 1%     ~     (p=0.056 n=5+5)
128/1024-8     223ns ± 1%     225ns ± 2%     ~     (p=0.183 n=5+5)
128/2048-8     420ns ± 1%     425ns ± 1%     ~     (p=0.056 n=5+5)
128/4096-8     822ns ± 2%     815ns ± 2%     ~     (p=0.333 n=5+5)
128/8192-8    1.60µs ± 2%    1.61µs ± 3%     ~     (p=0.516 n=5+5)

name        old speed      new speed      delta
128/1-8     33.8MB/s ± 1%  40.0MB/s ± 1%  +18.27%  (p=0.008 n=5+5)
128/2-8     66.6MB/s ± 2%  77.1MB/s ± 2%  +15.76%  (p=0.008 n=5+5)
128/4-8      131MB/s ± 2%   145MB/s ± 2%  +10.96%  (p=0.008 n=5+5)
128/8-8      253MB/s ± 1%   278MB/s ± 2%   +9.86%  (p=0.008 n=5+5)
128/16-8     554MB/s ± 1%   616MB/s ± 1%  +11.26%  (p=0.008 n=5+5)
128/32-8     990MB/s ± 2%  1108MB/s ± 1%  +11.90%  (p=0.008 n=5+5)
128/64-8    1.63GB/s ± 1%  1.83GB/s ± 2%  +11.95%  (p=0.008 n=5+5)
128/128-8   2.49GB/s ± 2%  2.69GB/s ± 1%   +7.73%  (p=0.008 n=5+5)
128/256-8   3.42GB/s ± 3%  3.54GB/s ± 2%   +3.66%  (p=0.016 n=5+5)
128/512-8   4.00GB/s ± 2%  4.09GB/s ± 1%   +2.23%  (p=0.032 n=5+5)
128/1024-8  4.58GB/s ± 1%  4.53GB/s ± 2%     ~     (p=0.222 n=5+5)
128/2048-8  4.87GB/s ± 1%  4.81GB/s ± 1%   -1.12%  (p=0.032 n=5+5)
128/4096-8  4.98GB/s ± 3%  5.02GB/s ± 2%     ~     (p=0.421 n=5+5)
128/8192-8  5.11GB/s ± 2%  5.08GB/s ± 3%     ~     (p=0.548 n=5+5)
lvrixp commented 5 years ago

👍

smira commented 5 years ago

@spaolacci any feedback from you?

spaolacci commented 5 years ago

Sorry for the more than late answer. I can't observe the allocation anymore as of Go 1.12 at least - I guess escape analysis eventually found it somewhere in the between - but those two references are intentionally useless, so better without them.

I did run a more comprehensive allocation check. All the regular calls are zero allocation, streaming hashers are exactly one allocation.