starkware-libs / stwo

Apache License 2.0
224 stars 68 forks source link

Optim: bit-reversal permutation with cache-blocking #813

Open mratsim opened 2 weeks ago

mratsim commented 2 weeks ago

Hey team,

I've been looking around the repo and I've seen that you consider bit-reversal permutations important enough to track them in benchmarks.

There is also a mention of cache-friendly algorithm: https://github.com/starkware-libs/stwo/blob/387a072dd7f4a56de3e196b779f9e392dfdd9406/crates/prover/src/core/utils.rs#L136-L153

The following is an in-place algorithm that is 33% faster than naive using cache-blocking (on my machine for EIP-4844 size):

https://github.com/mratsim/constantine/blob/65147ed/constantine/math/polynomials/fft.nim#L203-L295

Reference papers

The performance improvement has been independently confirmed in Gnark https://github.com/Consensys/gnark-crypto/pull/446 on x86 (though it's slower than naive on Apple, probably due to significant memory bandwidth there).

Image courtesy of @gbotrel (amd desktop) image