Closed dop-amin closed 5 months ago
Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.
Best, Junhao
Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.
Best, Junhao
Hi Junhao, thanks for reaching out.
For the stack optimized version (m4fstack
), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.
I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?
Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work. Best, Junhao
Hi Junhao, thanks for reaching out. For the stack optimized version (
m4fstack
), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?
Hi @dop-amin , thnaks for your work! Yes, I think then these two PRs could be merged into m4f
and m4fstack
, respectively.
Thank you @dop-amin. That all looks very good to me. I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?
Thank you @dop-amin. That all looks very good to me. I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?
The stack usage is identical to my measurements and the cycle counts also match my expectation.
Thanks!
I see that all the files under dilithium3/m4fstack are actual files and not symlinks. may be some of them should be symlink to files in dilithium2/m4f
I see that all the files under dilithium3/m4fstack are actual files and not symlinks. may be some of them should be symlink to files in dilithium2/m4f
Hi, indeed this slipped through. The issue is addressed in #342.
This PR adds stack optimizations for ML-DSA based on the ideas from the paper "Dilithium for Memory Constrained Devices" and code already written by @mkannwischer in a separate branch. It also superseeds the changes in #222.
I attach numbers on the stack utilization below.
Some remaining questions:
How to handle my addition of the results from the paper "Revisiting Keccak and Dilithium Implementations on ARMv7-M" to this branch in the light of #338?Do we want to accept the trade-off of re-sampling $s_1$ in the key generation?
Dilithium3 keypair stack usage: 4408 sign stack usage: 6608 verify stack usage: 2704
Dilithium5 keypair stack usage: 4408 sign stack usage: 8136 verify stack usage: 2712