DavidBuchanan314 commented 1 year ago

Documentation

I recently needed to generate some cryptographic key material. As far as I can tell, secrets.token_bytes() is the recommended way of doing this, and cpython's concrete implementation appears to be suitable for this purpose. However, the precise wording of the documentation for the API does not make this 100% clear.

The token subsection says:

The secrets module provides functions for generating secure tokens, suitable for applications such as password resets, hard-to-guess URLs, and similar.

And, under token_bytes() specifically:

Return a random byte string containing nbytes number of bytes. [...]

In my very pedantic reading of this, it does not make the sufficient guarantees for securely generating cryptographic key material. It does guarantee that generated tokens will be hard-to-guess, but that's a weaker guarantee than is sometimes necessary.

As an example, generating ECDSA signatures requires a uniform-random nonce value. If even a single bit of that nonce is biased one way or another, over a sufficient number of signatures, then an attacker can use that information to recover the private key (see https://eprint.iacr.org/2019/023.pdf )

As such, I think the documentation for these methods should be amended to explicitly state that the generated bytes are uniformly random, or to otherwise explicitly state that the direct output of secrets.token_bytes() may be suitable for generating cryptographic key material.

To be clear, I am not requesting any changes to the current implementation, merely that the docs accurately reflect what I assume to be the intended and current behaviour.

stevendaprano commented 1 year ago

You have to read the documentation of the whole module, not just individual functions. The very first sentence begins "The secrets module is used for generating cryptographically strong random numbers" and that statement applies to all of the functions in the module.

There are currently seven secrets functions or classes which return some sort of randomness (a number, token, string etc). I don't think we need to repeat that same statement about being cryptographically strong in every function.

Do you disagree?

stevendaprano commented 1 year ago

@gpshead, @dstufft what do you think, is this a necessary documentation change?

gpshead commented 1 year ago

To document this pedantically to guarantee some properties, the implementation and test cases need to be reviewed. Updating the docs makes sense to me so long as we don't make incorrect claims.

stevendaprano commented 1 year ago

Such a review is beyond my knowledge. If somebody with a good security background wants to move forward with this, please do :-)

NeedsMoar commented 1 year ago

The very first sentence begins "The secrets module is used for generating cryptographically strong random numbers" and that statement applies to all of the functions in the module.

The class in secrets calls through a painfully long chain of very similarly named and sometimes aliased functions with variants of "random" in the name. What you'll eventually reach is the non-blocking version of urandom on linux, and the comment on that function specifically says it isn't cryptographically secure (the blocking version apparently isn't either). This is contrary to the docs that say the blocking version is used for security. Apparently it is / was some mixture of RDRAND and other "hardware entropy", although RDSEED is the non-deterministic secure version produced from sampling silicon thermals and would be better if available (although much slower). If the processor lacks RDRAND it uses whatever linux generates for urandom which isn't cryptographically secure. On Windows it calls BCryptGenRandom which implements a NIST SP 800-90A CTR_DRBG compliant generator, so probably RDRAND. Attacking RDRAND requires a complex replacement of the instruction itself but it uses a shared buffer between all cores which could be exploited and was patched in microcode. Earlier windows uses FIPS 186-2 compliant generation.

Now the big but here is that it appears as though those functions are only called to seed a PRNG which may or may not fit the cryptographically secure definition and the even bigger but is that should either of the system calls fail for some reason (which they can; I think RDRAND is required to fail in some way if its internal module which is separate from the rest of the processor cores detects that for whatever reason it's not getting random data anymore), it falls back to using the PID and a shuffled version of the system time which is definitely not a secure seed method. In multiple runs over a day the entropy might not vary much if the system isn't doing much else, and who knows if the PRNG was ever audited.

That's as far as I'm willing to go with it, I may have missed something. If it's calling the seed function every time things should be good since the instructions haven't been proven to have flaws or generate repeatable sequences, but because of the microcode patch to fix what was practically a zero security risk outside of Intel's worthless SGX it's slow as heck without the mitigations disabled which can be done on a per-core basis.

dstufft commented 1 year ago

urandom is cryptographically secure.

python / cpython

docs: `secrets.token_bytes()` has imprecise security properties #101996

Documentation