Closed bchapuis closed 3 years ago
Today, some operations are performed inline (without function call) for performance reason. What is the impact of function calls in this situation?
When I implemented the chunker last year, it made a significant difference (I started implementing it this way), but we can try again with a recent version of Go. I think I don't have the benchmarks nor the code anywhere, but we can easily change things, run the benchmarks and use benchcmp
to compare them.
Here are the results of the benchmark for the master branch:
PASS
BenchmarkChunkerWithSHA256 10 122541194 ns/op 85.57 MB/s
--- BENCH: BenchmarkChunkerWithSHA256
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkChunkerWithMD5 20 82170451 ns/op 127.61 MB/s
--- BENCH: BenchmarkChunkerWithMD5
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkChunker 20 66481903 ns/op 157.72 MB/s
--- BENCH: BenchmarkChunker
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkNewChunker 20000 77459 ns/op
BenchmarkPolDivMod 20000000 63.6 ns/op
BenchmarkPolDiv 20000000 64.2 ns/op
BenchmarkPolMod 20000000 65.7 ns/op
BenchmarkPolDeg 30000000 41.6 ns/op
BenchmarkRandomPolynomial 5 452257005 ns/op
BenchmarkPolIrreducible 5 262396181 ns/op
ok github.com/restic/chunker 20.659s
I just tried to separate the chunker from the rolling hash function in a local branch and here are the results of the benchmark:
PASS
BenchmarkChunkerWithSHA256 10 141810242 ns/op 73.94 MB/s
--- BENCH: BenchmarkChunkerWithSHA256
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkChunkerWithMD5 10 103197901 ns/op 101.61 MB/s
--- BENCH: BenchmarkChunkerWithMD5
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkChunker 20 81411794 ns/op 128.80 MB/s
--- BENCH: BenchmarkChunker
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
chunker_test.go:274: 6 chunks, average chunk size: 1747626 bytes
BenchmarkNewChunker 20000 78330 ns/op
BenchmarkPolDivMod 30000000 58.8 ns/op
BenchmarkPolDiv 20000000 64.8 ns/op
BenchmarkPolMod 20000000 64.1 ns/op
BenchmarkPolDeg 50000000 40.5 ns/op
BenchmarkRandomPolynomial 3 507418834 ns/op
BenchmarkPolIrreducible 5 266704150 ns/op
ok github.com/restic/chunker 20.340s
So it seems to be a ~15% overhead.
Having an interface would still leave the possibility to have several chunker implementations with an highly optimized one.
That's consistent with my memory. Btw, here is the output of benchcmp
on the two benchmark runs:
benchmark old ns/op new ns/op delta
BenchmarkChunkerWithSHA256 122541194 141810242 +15.72%
BenchmarkChunkerWithMD5 82170451 103197901 +25.59%
BenchmarkChunker 66481903 81411794 +22.46%
BenchmarkNewChunker 77459 78330 +1.12%
BenchmarkPolDivMod 63.6 58.8 -7.55%
BenchmarkPolDiv 64.2 64.8 +0.93%
BenchmarkPolMod 65.7 64.1 -2.44%
BenchmarkPolDeg 41.6 40.5 -2.64%
BenchmarkRandomPolynomial 452257005 507418834 +12.20%
BenchmarkPolIrreducible 262396181 266704150 +1.64%
benchmark old MB/s new MB/s speedup
BenchmarkChunkerWithSHA256 85.57 73.94 0.86x
BenchmarkChunkerWithMD5 127.61 101.61 0.80x
BenchmarkChunker 157.72 128.80 0.82x
@bchapuis I have been maintaining (since early 2015) a library of rolling hashes. I recently realized that my implementation of the rabin karp algorithm is flawed (which somehow brought me here, looking for good code), but you are welcome to have a look/contribute!
I have been porting this code to https://github.com/chmduquesne/rollinghash. It (mostly) works!
[Edit] it fully works now (passes all tests).
I'm closing this issue, @chmduquesne's library has the rolling hash function alone. Thanks!
At the moment the rolling hash function is integrated in the chunker. I think it would make sense to separate the rolling hash function from the chunker and declare interfaces for both structs. This would allows to implement and compare different kind of chunking algorithms (fixed size, thresholds, etc.), and to implement different rolling hash functions (rabinkarp, buzhash, rolling adler, etc).