mupq / pqm4

Post-quantum crypto library for the ARM Cortex-M4
284 stars 72 forks source link

Dilithium/ML-DSA Stack Optimizations #340

Closed dop-amin closed 5 months ago

dop-amin commented 6 months ago

This PR adds stack optimizations for ML-DSA based on the ideas from the paper "Dilithium for Memory Constrained Devices" and code already written by @mkannwischer in a separate branch. It also superseeds the changes in #222.

I attach numbers on the stack utilization below.

Some remaining questions:

Dilithium3 keypair stack usage: 4408 sign stack usage: 6608 verify stack usage: 2704

Dilithium5 keypair stack usage: 4408 sign stack usage: 8136 verify stack usage: 2712

JunhaoHuang commented 6 months ago

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.

Best, Junhao

dop-amin commented 6 months ago

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work.

Best, Junhao

Hi Junhao, thanks for reaching out. For the stack optimized version (m4fstack), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.

I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?

JunhaoHuang commented 6 months ago

Hi @dop-amin, since you have already included our NTT code in this PR, I think we could just close PR #338 and just keep this PR. That would make it easier to merge your work. Best, Junhao

Hi Junhao, thanks for reaching out. For the stack optimized version (m4fstack), I removed the asymmetric multiplication, so the code will be different from your PR which concerns the speed-variant.

I will try to make the files including your work into the stack-version be similar-looking to the ones from your PR, so it will not be confusing but we still can have separate versions. Then both PRs should be merge-able. What do you think?

Hi @dop-amin , thnaks for your work! Yes, I think then these two PRs could be merged into m4f and m4fstack, respectively.

mkannwischer commented 5 months ago

Thank you @dop-amin. That all looks very good to me. I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?

dop-amin commented 5 months ago

Thank you @dop-amin. That all looks very good to me. I've just completed the benchmarks. Can you please have a look and let me know if everything looks the way you expect it?

The stack usage is identical to my measurements and the cycle counts also match my expectation.

mkannwischer commented 5 months ago

Thanks!

marco-palumbi commented 5 months ago

I see that all the files under dilithium3/m4fstack are actual files and not symlinks. may be some of them should be symlink to files in dilithium2/m4f

dop-amin commented 5 months ago

I see that all the files under dilithium3/m4fstack are actual files and not symlinks. may be some of them should be symlink to files in dilithium2/m4f

Hi, indeed this slipped through. The issue is addressed in #342.