spaolacci / murmur3

Native MurmurHash3 Go implementation
BSD 3-Clause "New" or "Revised" License
947 stars 127 forks source link

Murmur128: use Go 1.9 bits.RotateLeft64 functions #22

Closed maciej closed 5 years ago

maciej commented 6 years ago

This PR introduces bits.RotateLeft64 in place of 2 shifts + binary or operations

Sadly does not look like it improves performance (tested on a 2017 15' MacBookPro, Go 1.10.3)

without bits.RotateLeft64:

go test -run - -bench Benchmark128 .
goos: darwin
goarch: amd64
pkg: github.com/spaolacci/murmur3
Benchmark128/1-8    100000000           18.8 ns/op    53.20 MB/s
Benchmark128/2-8    100000000           20.0 ns/op   100.16 MB/s
Benchmark128/4-8    100000000           20.6 ns/op   194.31 MB/s
Benchmark128/8-8    100000000           23.0 ns/op   348.01 MB/s
Benchmark128/16-8           100000000           20.3 ns/op   787.21 MB/s
Benchmark128/32-8           100000000           23.0 ns/op  1391.11 MB/s
Benchmark128/64-8           50000000            28.8 ns/op  2223.46 MB/s
Benchmark128/128-8          50000000            38.9 ns/op  3292.33 MB/s
Benchmark128/256-8          20000000            62.7 ns/op  4083.19 MB/s
Benchmark128/512-8          20000000           111 ns/op    4577.90 MB/s
Benchmark128/1024-8         10000000           209 ns/op    4877.98 MB/s
Benchmark128/2048-8          3000000           419 ns/op    4885.73 MB/s
Benchmark128/4096-8          2000000           804 ns/op    5093.00 MB/s
Benchmark128/8192-8          1000000          1589 ns/op    5154.91 MB/s

with bits.RotateLeft64:

go test -run - -bench Benchmark128 .
goos: darwin
goarch: amd64
pkg: github.com/spaolacci/murmur3
Benchmark128/1-8    100000000           18.6 ns/op    53.65 MB/s
Benchmark128/2-8    100000000           19.0 ns/op   105.24 MB/s
Benchmark128/4-8    100000000           20.2 ns/op   198.10 MB/s
Benchmark128/8-8    100000000           22.9 ns/op   348.85 MB/s
Benchmark128/16-8           100000000           20.4 ns/op   786.13 MB/s
Benchmark128/32-8           100000000           24.2 ns/op  1322.43 MB/s
Benchmark128/64-8           50000000            28.3 ns/op  2257.56 MB/s
Benchmark128/128-8          30000000            39.0 ns/op  3279.77 MB/s
Benchmark128/256-8          20000000            63.4 ns/op  4036.51 MB/s
Benchmark128/512-8           20000000          113 ns/op    4520.35 MB/s
Benchmark128/1024-8           10000000         222 ns/op    4604.64 MB/s
Benchmark128/2048-8          3000000           432 ns/op    4739.47 MB/s
Benchmark128/4096-8          2000000           781 ns/op    5240.30 MB/s
Benchmark128/8192-8          1000000          1551 ns/op    5280.24 MB/s

I'm leaving it up for discussion. Perhaps someone else could run the benchmarks on different hardware.

spaolacci commented 5 years ago

This doesn't improve performance because initial hand written implementation was crafted to trigger Go compiler optimization.

Go 1.9 is old enough now. Having an explicit rotate left is however nice - if only for readability - so I the patch is still positive.

Thanks for proposing that, I'll update rotate 32 as well along the same lines.