Closed workingjubilee closed 3 years ago
Thank you! This looks good. I'll hold off on landing it until I have some time to read through it carefully, but this is definitely a change I was interested in making.
The built-in benchmark harness does tend to be a little finnicky, since it doesn't include warm-up iterations or anything like that. bench_long_blake2sp
should be mostly equivalent to bench_long_blake2s_many_8x
, so I think what we're seeing is that bench_long_blake2sp
underperformed by half on your main branch for "some random reason". I'm curious whether that blip might disappear, if you ran it ten times in a row or something like that?
In spite of the large diff, this is a fairly small actual change: just use arrayvec 0.7 everywhere and instead of using the array parameter, use <T, const N: usize> parameters.
Bench difference appears to be largely negligible, with an admittedly notable hit on two benches and an explosive improvement on bench_long_blake2sp! No idea why! But I use an AMD processor and did not exhaustively bench and profile this, I just ran it a few times on each to make sure the diffs were roughly constant, so please feel free to do your own review.