integritychain / fips204

Pure Rust implementation of FIPS 204 Module-Lattice-Based Digital Signature Standard for server, desktop, browser and embedded applications.
Apache License 2.0
9 stars 1 forks source link

Increased stack space in 0.4.3 #6

Open DiscordJim opened 4 days ago

DiscordJim commented 4 days ago

When signing messages with private & public keys for MLDSA44, I am getting a stack overflow in 0.4.3 whereas in 0.4.0 the code executes fine.

Was there anything added between 0.4.0 and 0.4.3 that would substantially increase stack usage?

The code just gets a public key and a private key, and signs a message.

Would share, but it is proprietary.

integritychain commented 4 days ago

Yes, the newer version expands both the public and private keys to internally include some pre-computed elements that speed sign/verify. Classic space-time tradeoff (and the library avoids using the heap). I will investigate this over the next few days aiming to tighten things. Three things may help for the moment:

1) What platform/OS are you using? How large is your message and is it on the stack?

2) Could you add opt-level flags to your Cargo.toml as follows and let me know if it helps? This makes the compiler work a little harder to tighten things.

[profile.dev]
opt-level = 1

[profile.release]
opt-level = 1

3) Can you try RUST_MIN_STACK=104857600 cargo test on your code? The value I have put here is likely way larger than necessary and can likely be trimmed.

Thank you!

integritychain commented 4 days ago

Note also https://doc.rust-lang.org/stable/cargo/reference/profiles.html#overrides

It appears the clause below may also be sufficient. I think we may have to iterate together a bit to best resolve it (as the "fips204" syntax below is non-obvious to me).

# Set the default for dependencies.
[profile.dev.package."fips204"]
opt-level = 2
DiscordJim commented 4 days ago

I am on Windows 11, rust compiler 1.79.0, toolchain stable-x86_64-pc-windows-msvc.

  1. When I run with RUST_MIN_STACK=104857600 cargo test, the error does not occur. Why did you choose that value?
  2. Message is 74 bytes long.
  3. Adding the opt-level for fips204 in dev mode does not fix the issue.

Would it be possible to add heap allocated keys? Although I do not understand how a few thousand byte arrays would cause this. If help is required I'm willing to pitch in.

Can you detail the additional details you are storing? Maybe we could come up with a more space efficient way.

DiscordJim commented 3 days ago

Quick follow up, maybe an alloc feature could be in order?

integritychain commented 3 days ago

One of the primary requirements for the library is to run on an MCU without a heap allocator (at least for verify). So I would prefer to chase where the bloat is coming from for the moment. I believe Rust on Windows has a smaller default stack limit than Ubuntu, which partly explains why I didn't see this earlier. The 104857600 value was just an oversized guess to see if the problem could be temporarily bypassed.

Please could you rerun the RUST_MIN_STACK=104857600 cargo test with progressively smaller values to get a rough idea of where it breaks? From Windows-vs-Ubuntu, I am guessing between 1000000 and 2000000.

In parallel, I have a route towards removing two copies of a rather large object. I'll implement this either tonight or tomorrow and ping you a pointer.

DiscordJim commented 3 days ago

I will run this when I get home in an hour or two. Which function specifically is so stack heavy? Are you storing  in this new version? Or is it just the keys in general? Would like to take a look.

One of the primary requirements for the library is to run on an MCU without a heap allocator (at least for verify)

Fair enough. In the future an alloc feature flag could be useful.

integritychain commented 3 days ago

Have a look at the keygen_internal function here https://github.com/integritychain/fips204/blob/abebea17b0c28cc3489507ffb0b60d8ba3df05dc/src/ml_dsa.rs#L360 It precomputes cap_a_hat (<- the variable you are referencing above) and stores it in both the private and public structs.

The latest commit moves the expansion of rho->cap_a_hat to the sign and verify just where it is needed. It will be transparent to users. It should dramatically shrink stack usage based on my local (and a bit goofy setup) experiments. Try picking it up per the directions at https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories. Something like this should work:

[dependencies]
fips204 = { git = "https://github.com/integritychain/fips204.git" }
DiscordJim commented 2 days ago

As a follow-up, without setting the stack space, stack overflow. Through trial and error, I see the error stops occurring around here,

RUST_MIN_STACK=2757600

Can confirm the latest version on git (e9553c24ee169b863119a5a61f8099a505fbc9cb) works as advertised. No stack overflow.

Two quick questions for you.

  1. When is a new version slated for release?
  2. Is there a feature road map?

Additionally, thank you for this great work.

integritychain commented 2 days ago

Thanks for this!

I need to find a few quality days to polish up some internal comments and maybe refactor a small bit of code. No externally-visible changes. I hope to do this next week, and then will release a new version. Please use the git reference until then.

As far as features, my ambition is to make this super robust and super accessible ... so more testing, more support, better docs, code reviews etc. At the moment, I don't have any new features in view as I feel it should do one thing well and avoid complexity. That said, there are interesting options for performance improvements.

I'll leave this issue open for the moment, and close it when the next release gets published.