Closed JunhaoHuang closed 5 months ago
Thanks @JunhaoHuang for opening this PR! Looks great. We'll test this and rebenchmark it soon.
@dop-amin is working on stack optimizations so we finally have state-of-the-art low stack implementations in pqm4. That will be great combined with this PR.
Sorry for the long delay. I just pushed the benchmarks. The diff makes the speed-up look smaller than it actually is. That's because the old benchmarks were only running 100 iterations and seemed to have gotten quite lucky. If you run more the old code is at around 6.2M while the new one takes around 5.9M
scheme | implementation | key generation [cycles] | sign [cycles] | verify [cycles] |
---|---|---|---|---|
dilithium3 (1000 executions) | m4f | AVG: 2,516,008 MIN: 2,514,692 MAX: 2,527,617 |
AVG: 6,181,249 MIN: 2,935,143 MAX: 26,805,985 |
AVG: 2,411,260 MIN: 2,410,878 MAX: 2,411,645 |
Thank you for your help in merging this PR @mkannwischer!
Hi @mkannwischer,
I just noticed that you created an issue #329 to integrate our tches2024 artifact. This PR includes code that replaces the NTT 769 with Plantard arithmetic on Cortex-M4. This update will only lead to a small speed-up for the
crypto_sign_signature()
of Dilithium3.As for our Keccak implementation, it has already been merged to pqm4 in #254 by our coauthor @aadomn .
Thank you very much for your attention to our work!
Junhao