Should PQCP implementations ship `randombytes()`

mkannwischer commented 3 months ago

During the 2024-07-18 TSC meeting, we were discussing https://github.com/pq-code-package/tsc/issues/4.

Let me try to summarize (please correct me if I got something wrong): There appears to be consensus that the derandomized APIs should be supported by all implementations in the PQ Code Package. The discussion then drifted towards if the randomized APIs should also be required or not. Some opinions were raised that the derandomized versions are much easier to integrate into other libraries/software as secure randomness can be provided from the outside. This lead to discussions whether or not we should take on the responsibility of providing a good source of randomness at all or if this should be expected to be provided by the user.

While these discussions were mostly about APIs, I think it makes sense to discuss this separately: Do we want to include a decent default randombytes implementation or not? If we don't, then not supporting the randomized APIs sounds like a reasonable choice as it will ease the integration. If we do, then the randomized APIs are easier to use and harder to misuse. Some strong opinions on this have been voiced, and the conclusion was that we need to discuss this in a broader group. We may want a TSC vote on this in case we want this to be consistent across projects.

mkannwischer commented 3 months ago

My view is that the developers and maintainers of pqcp are (hopefully) more qualified to pick a randombytes implementation that is less likely going to result in disasters. This at least applies to common platforms and operating systems. This may not matter for integrations into other big crypto libraries like liboqs or mbedtls, but it does matter for smaller projects developed by people with less cryptography experience.

I'd very much prefer to provide a good default choice even if we want to document that user should replace the randomness generation with their own suitably secure randomness generator. I think we should make it easy to replace it, but very hard to misuse.

I'm worried that we will see pqcp code being used in the wild with hard-coded randomness - because who really knows what coins is supposed to be? I am not sure how we can avoid this if we really decide to only support the derandomized APIs.

To confirm my point, I asked ChatGPT to use the API we proposed in #4 (mostly for fun - I don't think this is what we should worry about most). It produced the following code:

...
uint8_t coins[32];     // Random coins

// In a real-world scenario, you would obtain these from a proper source
for (int i = 0; i < 32; i++) {
    coins[i] = (uint8_t)(255 - i);
}

// Call the pqcp_mlkem512_ref_enc_derand function
int result = pqcp_mlkem512_ref_enc_derand(ct, ss, pk, coins);

This results in convieniently useless crypto. At least ChatGPT seems to know that you should not do it like this, but who will really read that comment here? I am also curious where it learned that this is a good example of initializing randomness.

hanno-becker commented 3 months ago

I lean towards thinking that providing a good source of randomness is a difficult problem on its own that is outside of the scope of PQCP and should be handled by whatever higher-level stack PQCP will be used with.

Some thoughts on the risk @mkannwischer flags regarding people directly using the PQCP API and doing it badly by providing poor random coins:

As @mkannwischer says, we should distinguish the question of scope from the question of API. The issue of a coins parameter that's tempting to use with 0's first and then forget about it, is one of API, not one of scope. If we decide randomness is out of scope, one could instead consider an API with an unimplemented randombytes hook that users need to link against, and instead of a coins parameter the user has to pass a randomness context. Something like int mlkem512_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk, void *rnd_ctxt); where rnd_ctxt is an opaque pointer passed to a user-provided stateful randombytes (this is a common pattern in Mbed TLS at least). I'm not saying that this is necessarily the best API, but it would be easier to flag in the documentation of randombytes() that this MUST be a proper source of randomness, and it takes more effort for a user to ignore -- so certainly there's options to consider other than necessarily having a plain coins parameter.
How likely are customers of PQCP that do just use PQCP and not also another cryptographic stack? Using just PQCP seems like a bad idea because you should be using some hybrid PK scheme, not solely rely on MLKEM. So I'm not sure if we even serve the customer well by making it easy to use PQCP standalone.

cothan commented 3 months ago

I lean towards providing randombytes() in PQCP code. I like the idea of "it works." When "it doesn't work", the as-is PQCP randombytes() "shouldn't compile", then the developer must sit down and read the manual.

PQCP should ship the randombytes() function for the "it works!" idea. This will make the majority happy.
When it doesn't work, any potential error should appear at compile time, IMO. In that case, I'm sure the developers know what they are dealing with on their platforms/OS, perhaps they already have a solution somewhere in their stack.

I'm concerned that if randombytes() isn't included in PQCP or if it complicates the API, things can easily go wrong.

dstebila commented 2 months ago

If there's a randomized version in the API, then PQCP is going to need an implementation of randombytes at least for testing. So then the question becomes whether you make that testing implementation a serious implementation or not. If the testing implementation isn't done properly, there's a risk that someone will copy that code, thinking it's okay, even though it's not.

franziskuskiefer commented 1 month ago

A question we need to answer here is how we expect folks to use the code here.

Integration into an existing crypto library will most likely NOT require a randombytes function. No serious crypto library can do without a proper rng.
Integration into other (potentially, not crypto) code. This would be the "it works" case as @cothan called it, and would require a randombytes function.

If we actually expect usage outside of crypto library, we have to provide randomness. I personally think people really shouldn't use it in this way. But at the same time do I expect people to actually use it that way 😔. Which implies that we should provide something.

planetf1 commented 1 month ago

One of the consumers of the implementations here in general would be something like liboqs - effectively a cryptolibrary. That would likely take the responsibility to provide a rng. Or either via liboqs or directly into say OpenSSL - again with rng. All examples of 1. above.

This seems most likely, but I still agree with @franziskuskiefer 's last point that we do include a rng (with clear documented caveats etc) since some I think will use directly

hanno-becker commented 1 month ago

I think the barrier created by not providing an RNG is a more effective way of conveying that there's non-trivial work to be done here, than providing an insecure default and aiming to grab people's attention through documentation and warning messages.

I remain opposed to shipping randombytes() -- it's out of scope IMO.

pq-code-package / tsc

Should PQCP implementations ship `randombytes()` #86