lukechampine / blake3

An AVX-512 accelerated implementation of the BLAKE3 cryptographic hash function
MIT License
356 stars 23 forks source link

Splits g() into 2 inlined funcs. #2

Closed renthraysk closed 4 years ago

renthraysk commented 4 years ago

Eliminates calls & bound checking.

benchmark old ns/op new ns/op delta BenchmarkWrite-4 368095 212280 -42.33% BenchmarkChunk-4 17496 10271 -41.30% BenchmarkXOF-4 10.5 5.93 -43.52%

benchmark old MB/s new MB/s speedup BenchmarkWrite-4 89.02 154.36 1.73x BenchmarkXOF-4 95.48 168.68 1.77x

lukechampine commented 4 years ago

Wonderful, thank you! I knew there was some way to improve the inlining.

There was a (trivial) merge conflict, so I rebased and merged locally.