penumbra-zone / poseidon377

An instantiation of the Poseidon hash for use with decaf377
https://protocol.penumbra.zone/main/crypto/poseidon.html
Other
28 stars 10 forks source link

perf: speeding up `poseidon-permutation` #21

Closed redshiftzero closed 2 years ago

redshiftzero commented 2 years ago

This PR contains some performance fixes in the implementation of poseidon-permutation. Below are some numbers based on benchmarking and profiling. Related to https://github.com/penumbra-zone/penumbra/issues/1123.

From the paper the optimized version of Poseidon reduces the number of multiplications in the partial rounds.

The approximate "5x" number cited in the paper came from a Poseidon instance with t=24, R_F = 8, R_P = 42, so the number of multiplications per permutation was: t^2 (R_F + R_P) = 24^2 (8 + 42) = 28,800 multiplications After the optimization, the number of multiplications in the partial rounds goes from t^2 to 2t, so after: t^2 R_F + 2 t R_P = 24^2 8 + 2 24 * 42 = 6,624 multiplications Which is about a 4.3x reduction in multiplications in the permutation. There are big gains here because the instance is very high t, very high R_P (partial round).

For us testing with the highest width hash we use, 4:1, we have t=5, R_F = 8, R_P = 31 so for us the expected reduction in the number of multiplications: Before: t^2 (R_F + R_P) = 5^2 (8 + 31) = 975 multiplications After: t^2 R_F + 2 t R_P = 5^2 8 + 5 2 31 = 510 multiplications Which is about a 1.9x reduction in multiplications.

Comparing the empirical performance for the 4:1 parameter set derived using poseidon-paramgen: ark-sponge: ~39.5us/hash unoptimized poseidon-permutation (this repo): ~36.0us/hash optimized poseidon-permutation (this repo): ~26.8us/hash

At this point from profiling the computation for the optimized permutation, we're dominated by applying the Sboxes, which take about 60% of the computation of the permutation. The partial round multiplication is down to 26%, so if further improvements are made to that area of the code we'll get some additional speedups.