Closed mratsim closed 3 years ago
The speedup with BLST SHA256 is 14x~15x
In particular hashTreeRoot is one of our bottleneck in NBC and the BeaconState is 5MB
is that using some fancy instructions or plain code?
With SSE3 for vectorized xor, shuffles and vectorized shift right:
https://github.com/supranational/blst/blob/master/build/elf/sha256-x86_64.s#L359-L362
This is portable to ARM since they have equivalent vector instructions.
This updates BLST from commit Reference diff:
State
-D__BLST_PORTABLE__
instead of falling back to Milagro if the CPU doesn't support SSE3