lukechampine / blake3

An AVX-512 accelerated implementation of the BLAKE3 cryptographic hash function
MIT License
356 stars 23 forks source link

feat: improve performance #3

Closed orisano closed 4 years ago

orisano commented 4 years ago

name old speed new speed delta Write-4 221MB/s ± 1% 253MB/s ± 0% +14.43% (p=0.000 n=43+9) Sum256-4 223MB/s ± 1% 259MB/s ± 0% +15.94% (p=0.000 n=41+9) XOF-4 236MB/s ± 0% 270MB/s ± 0% +14.38% (p=0.000 n=44+8)

name old alloc/op new alloc/op delta Write-4 0.00B 0.00B ~ (all equal) Sum256-4 0.00B 0.00B ~ (all equal) XOF-4 0.00B 0.00B ~ (all equal)

name old allocs/op new allocs/op delta Write-4 0.00 0.00 ~ (all equal) Sum256-4 0.00 0.00 ~ (all equal) XOF-4 0.00 0.00 ~ (all equal)

lukechampine commented 4 years ago

Thanks for your contribution! Unfortunately this code conflicts with #4, which yields better performance, so I have to reject it. However, your commit https://github.com/lukechampine/blake3/pull/3/commits/436a80c77740c29e9bb86f7d252eea4941823008 inspired me to try unrolling the word<->byte conversions, which gave a nice 15% performance increase! :)

name      old time/op    new time/op    delta
Write-4     3.02ns ± 2%    2.61ns ± 2%  -13.59%  (p=0.008 n=5+5)
Sum256-4    3.17µs ± 0%    2.82µs ± 4%  -11.18%  (p=0.016 n=4+5)
XOF-4       2.90ns ± 2%    2.51ns ± 1%  -13.53%  (p=0.008 n=5+5)

name      old speed      new speed      delta
Write-4    332MB/s ± 2%   384MB/s ± 2%  +15.68%  (p=0.008 n=5+5)
XOF-4      345MB/s ± 2%   399MB/s ± 1%  +15.64%  (p=0.008 n=5+5)