Unoptimized BLAKE2b - Githubissues

Issue reported in the context of Kudelski Security's audit

The implementation does not leverage vectorized instructions. For example, on platforms supporting AVX2, a reference, portable implemnentations is about 40% slower than an AVX2 implementation, as reported on a Cannonlake microarchitecture benchmark from SUPERCOP.

An AVX2 implementation of BLAKE2b can be found in the SUPERCOP archive as well as in Libsodium. An AVX512-optimized version of BLAKE2s (not BLAKE2b) is used in Wireguard. Similar techniques may be used to optimize BLAKE2b for the AVX512 instruction set.

tevador / RandomX

Unoptimized BLAKE2b #60