Open DiscordJim opened 4 days ago
Yes, the newer version expands both the public and private keys to internally include some pre-computed elements that speed sign/verify. Classic space-time tradeoff (and the library avoids using the heap). I will investigate this over the next few days aiming to tighten things. Three things may help for the moment:
1) What platform/OS are you using? How large is your message and is it on the stack?
2) Could you add opt-level
flags to your Cargo.toml as follows and let me know if it helps? This makes the compiler work a little harder to tighten things.
[profile.dev]
opt-level = 1
[profile.release]
opt-level = 1
3) Can you try RUST_MIN_STACK=104857600 cargo test
on your code? The value I have put here is likely way larger than necessary and can likely be trimmed.
Thank you!
Note also https://doc.rust-lang.org/stable/cargo/reference/profiles.html#overrides
It appears the clause below may also be sufficient. I think we may have to iterate together a bit to best resolve it (as the "fips204" syntax below is non-obvious to me).
# Set the default for dependencies.
[profile.dev.package."fips204"]
opt-level = 2
I am on Windows 11, rust compiler 1.79.0, toolchain stable-x86_64-pc-windows-msvc
.
RUST_MIN_STACK=104857600 cargo test
, the error does not occur. Why did you choose that value?opt-level
for fips204
in dev
mode does not fix the issue.Would it be possible to add heap allocated keys? Although I do not understand how a few thousand byte arrays would cause this. If help is required I'm willing to pitch in.
Can you detail the additional details you are storing? Maybe we could come up with a more space efficient way.
Quick follow up, maybe an alloc
feature could be in order?
One of the primary requirements for the library is to run on an MCU without a heap allocator (at least for verify). So I would prefer to chase where the bloat is coming from for the moment. I believe Rust on Windows has a smaller default stack limit than Ubuntu, which partly explains why I didn't see this earlier. The 104857600 value was just an oversized guess to see if the problem could be temporarily bypassed.
Please could you rerun the RUST_MIN_STACK=104857600 cargo test
with progressively smaller values to get a rough idea of where it breaks? From Windows-vs-Ubuntu, I am guessing between 1000000 and 2000000.
In parallel, I have a route towards removing two copies of a rather large object. I'll implement this either tonight or tomorrow and ping you a pointer.
I will run this when I get home in an hour or two. Which function specifically is so stack heavy? Are you storing  in this new version? Or is it just the keys in general? Would like to take a look.
One of the primary requirements for the library is to run on an MCU without a heap allocator (at least for verify)
Fair enough. In the future an alloc
feature flag could be useful.
Have a look at the keygen_internal function here https://github.com/integritychain/fips204/blob/abebea17b0c28cc3489507ffb0b60d8ba3df05dc/src/ml_dsa.rs#L360 It precomputes cap_a_hat (<- the variable you are referencing above) and stores it in both the private and public structs.
The latest commit moves the expansion of rho->cap_a_hat to the sign and verify just where it is needed. It will be transparent to users. It should dramatically shrink stack usage based on my local (and a bit goofy setup) experiments. Try picking it up per the directions at https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories. Something like this should work:
[dependencies]
fips204 = { git = "https://github.com/integritychain/fips204.git" }
As a follow-up, without setting the stack space, stack overflow. Through trial and error, I see the error stops occurring around here,
RUST_MIN_STACK=2757600
Can confirm the latest version on git (e9553c24ee169b863119a5a61f8099a505fbc9cb) works as advertised. No stack overflow.
Two quick questions for you.
Additionally, thank you for this great work.
Thanks for this!
I need to find a few quality days to polish up some internal comments and maybe refactor a small bit of code. No externally-visible changes. I hope to do this next week, and then will release a new version. Please use the git reference until then.
As far as features, my ambition is to make this super robust and super accessible ... so more testing, more support, better docs, code reviews etc. At the moment, I don't have any new features in view as I feel it should do one thing well and avoid complexity. That said, there are interesting options for performance improvements.
I'll leave this issue open for the moment, and close it when the next release gets published.
When signing messages with private & public keys for MLDSA44, I am getting a stack overflow in 0.4.3 whereas in 0.4.0 the code executes fine.
Was there anything added between 0.4.0 and 0.4.3 that would substantially increase stack usage?
The code just gets a public key and a private key, and signs a message.
Would share, but it is proprietary.