Closed orisano closed 4 years ago
Thanks for your contribution! Unfortunately this code conflicts with #4, which yields better performance, so I have to reject it. However, your commit https://github.com/lukechampine/blake3/pull/3/commits/436a80c77740c29e9bb86f7d252eea4941823008 inspired me to try unrolling the word<->byte conversions, which gave a nice 15% performance increase! :)
name old time/op new time/op delta
Write-4 3.02ns ± 2% 2.61ns ± 2% -13.59% (p=0.008 n=5+5)
Sum256-4 3.17µs ± 0% 2.82µs ± 4% -11.18% (p=0.016 n=4+5)
XOF-4 2.90ns ± 2% 2.51ns ± 1% -13.53% (p=0.008 n=5+5)
name old speed new speed delta
Write-4 332MB/s ± 2% 384MB/s ± 2% +15.68% (p=0.008 n=5+5)
XOF-4 345MB/s ± 2% 399MB/s ± 1% +15.64% (p=0.008 n=5+5)
name old speed new speed delta Write-4 221MB/s ± 1% 253MB/s ± 0% +14.43% (p=0.000 n=43+9) Sum256-4 223MB/s ± 1% 259MB/s ± 0% +15.94% (p=0.000 n=41+9) XOF-4 236MB/s ± 0% 270MB/s ± 0% +14.38% (p=0.000 n=44+8)
name old alloc/op new alloc/op delta Write-4 0.00B 0.00B ~ (all equal) Sum256-4 0.00B 0.00B ~ (all equal) XOF-4 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta Write-4 0.00 0.00 ~ (all equal) Sum256-4 0.00 0.00 ~ (all equal) XOF-4 0.00 0.00 ~ (all equal)