slothy-optimizer / slothy

Assembly super-optimization via constraint solving
https://slothy-optimizer.github.io/slothy/
Other
167 stars 10 forks source link

Add Keccak #65

Open mkannwischer opened 4 months ago

mkannwischer commented 4 months ago

WIP adding Keccak via SLOTHY.

Right now this is a hybrid 4x Keccak (2 scalar, 2 Neon). I de-interleaved the previous manual-interleaved code and optimized it via SLOTHY. There is still a lot of potential for refactoring.

In the current state (https://github.com/slothy-optimizer/pqax/commit/c69030c65e205fc585026265bcc492da41f85024), the results look as follow:

[0|5|25|50|75|95|100] = [(7670) | 7671 | 7671 |* 7672 *| 7675 | 7697 | (7709)] (100-th AVGs of keccak_f1600_x4_hybrid_slothy)
[0|5|25|50|75|95|100] = [(6623) | 6624 | 6624 |* 6624 *| 6628 | 6646 | (6672)] (100-th AVGs of keccak_f1600_x4_hybrid_slothy_opt_a55)

For reference:

The 6624 is already quite a bit faster than the 7288 reported in https://kannwischer.eu/papers/2022_armv8keccak.pdf This is still slower than the 1x scalar one in the same paper which was 1418; 1418*4=5672)

Related to https://github.com/slothy-optimizer/pqax/pull/6

hanno-becker commented 4 months ago

Rebase on top of #81