Closed joshtriplett closed 1 month ago
I've updated this proposal to build upon the now-accepted ACP 393, along with the changes proposed by @joboet.
I don't think "insecure" is the important part of these. In fact, it is downright misleading since ChaCha is a CSPRNG. The key property of this RNG is that it is seedable and once seeded is guaranteed to always return the same results. A better name would be SeedableRandomSource
.
@Amanieu
Given sufficient confidence in ChaCha as an algorithm, it's certainly possible for us to create an implementation that 1) implements all the necessary protections a secure implementation might want (e.g. protection against fork, protection against VM fork, memory locking, protection keys where available), 2) commits to fixing those if there's an issue with them, 3) commits to addressing future scenarios we haven't thought of yet, and 4) still provides the same outputs in future versions.
However, it's much much easier and more maintainable for the uses requiring secure randomness to call getrandom
or equivalent (via DefaultRandomSource
), and delegate secure randomness to the kernel and/or hardware. That means we don't have to take on that maintenance burden ourselves.
The point of this proposal is to have a "good enough" RNG that doesn't make some of those promises, and is easy to maintain, and provides seedability.
If people really want to name this ChaChaRandomSource
we can. But I still think it's important to disclaim any potential claims about security that people might otherwise assume, and point people towards DefaultRandomSource
for secure randomness. (I've expanded the doc comments in this proposal accordingly.)
Is my understanding that this would fulfill the Math.random()
use case?
I don't think "insecure" is the important part of these. In fact, it is downright misleading since ChaCha is a CSPRNG.
So was RC4 in the distant past. The issue is that no practical cryptographic algorithm rests on proofs that the problem is outside P, and how could they, since P!=NP has not been proven yet. And many symmetric algorithms do not rest on mathematical proofs at all.
So even if we assume today that they're secure for practical purposes, we cannot promise so for the future. This is why josh is correct about the "pick any two" aspect.
If we promise seedability and stability then we should be prepared for the possibility that it one day will be considered insecure. At which point we would not want to be in a situation where users rely on it for security purposes, which means actively discouraging its use for such purposes.
Allowing "replay" of such games by providing a seed
I just want to note that a PRNG is likely a poor choice for this specific use case; noise based RNG (link: GDC talk) is what is actually appropriate. Specifically, PRNGs are limited to advancing their state forward. RNG derived from chaotic integer noise (essentially a hash function) on the other hand is random access, allowing replays to be jumped around without burning CPU time spinning the PRNG.
Of course other constraints might still prevent random seeks, and “keyframe” style solutions to those can be applied to the PRNG as well, but (since this is a notable use case for using deterministic RNG) this still bears mentioning.
Not all RNGs are trapdoors, e.g. the ChaCha keystream is seekable and can double as an RNG. Block ciphers in counter mode would do too. It's mostly dedicated cryptographic RNG constructions that are trapdoors to provide forward secrecy, which isn't relevant for insecure ones.
Though you'd still need an API for seeking.
@Amanieu Changed to DeterministicRandomSource
, and added the unresolved question on get_seed
.
We discussed this in the libs-api meeting today and are happy to accept this, with the get_seed
issue to be resolved in the implementation PR.
I've started implementing this and wondered whether using ChaCha20 might be a little overkill. This generator should never be used for anything remotely security-sensitive (unless we want to guarantee resistance against timing-attacks, which I'm quite opposed to), so the computational complexity of ChaCha doesn't actually give us anything. Wouldn't a simpler RNG suffice, as long as it yields good-quality output?
you could use ChaCha8, which is ChaCha20 but with a lot less rounds: https://en.wikipedia.org/wiki/Salsa20#Reduced-round_ChaCha
another option is using AES-128 in CTR mode, which should be quite fast for the processors that include AES acceleration instructions. it's basically:
struct State(u128, u128);
fn aes_128(key: u128, input: u128) -> u128;
impl State {
fn next(&mut self) -> u128 {
let retval = aes_128(self.0, self.1);
self.1 += 1;
retval
}
}
it has the added benefit that it's really easy to seek to any location in the output allowing you to generate output in parallel by copying the state to multiple threads and seeking each state to a different position.
@joboet wrote:
I've started implementing this and wondered whether using ChaCha20 might be a little overkill. This generator should never be used for anything remotely security-sensitive (unless we want to guarantee resistance against timing-attacks, which I'm quite opposed to), so the computational complexity of ChaCha doesn't actually give us anything. Wouldn't a simpler RNG suffice, as long as it yields good-quality output?
That seems reasonable to me.
@programmerjake wrote:
you could use ChaCha8, which is ChaCha20 but with a lot less rounds
That sounds completely reasonable for the insecure generator.
another option is using AES-128 in CTR mode, which should be quite fast for the processors that include AES acceleration instructions
But probably slower for those that don't, and the generator should produce the same results on every target. AFAICT ChaCha-based algorithms will be substantially faster on anything without hardware AES support, and reasonably fast everywhere in any case.
something that could be even better than ChaCha8 or AES-128 in CTR mode is to use a counter-based random number generator such as the Squares RNG -- it has the benefits of being very fast, having high-quality output (though I'm assuming not cryptographic quality), and being O(1) seekable which makes parallelization much easier.
@programmerjake I'm going to leave a call like that to cryptographers like @traviscross. My understanding is that ChaCha has had more scrutiny.
@programmerjake I'm going to leave a call like that to cryptographers like @traviscross. My understanding is that ChaCha has had more scrutiny.
yes, but if it's specifically insecure, why do you need cryptographic properties? I recognize that that means you have better randomness, but if it's good enough for everything but cryptography, much faster, and O(1) seekable (seekability is necessary for some applications), then imo that means it's good enough for std
's insecure RNG.
Counter-based RNGs seem like a good option, they should allow more instruction-level parallelism. My only worry with Squares is that its quality depends on the key:
The key should be an irregular bit pattern with roughly half ones and half zeros.
so when users do something like DeterministicRandomSource::from_seed(0)
, they'll get unusable data.
This constrains the number of usable keys, it's no longer really $2^64$. So perhaps some other CBRNG like Threefry or Philox would be better suited?
Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.
As long as it's unstable we can still change the algorithm.
Going with a reduced-round version is probably a good idea though to send the right message about security.
My only worry with Squares is that its quality depends on the key:
The key should be an irregular bit pattern with roughly half ones and half zeros.
so when users do something like
DeterministicRandomSource::from_seed(0)
, they'll get unusable data.
Well, you could have the seed be used as the counter value instead of the key value (though that has problems that nearby seeds generate similar random sequences, just shifted in time), or could have the key be a good hash of the seed, where good means having the Avalanche effect. Maybe SipHash since we already have that in std
and seeding the RNG is less performance critical than generating numbers or seeking?
Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.
except that iirc chacha is not seekable (edit: chacha is seekable), which i think is an important property that our RNG should have.
Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.
for comparison against squares-rng, for the squares32 variant I got compiler explorer to generate 1GiB of random output in 0.24s on one thread with clang -march=znver4 -O3
(you have to refresh a few times to get the AVX512 capable runner): https://gcc.godbolt.org/z/9xTYKn4bo
(edit: rerunning again it got 0.19s)
I will be very surprised if ChaCha8 gets anywhere near that fast.
I have used counter-based PRNGs in research and think they would be incredibly useful functionality-wise for an insecure seeded and reproducible PRNG:
reseeding, jumping, and forking are all trivial to support, we were in no way limiting the operations one can perform on the PRNG
the inner state, the counter, is just an integer that can be exposed as a public field or with setter methods
we can choose any hash function with good avalanching
I will be very surprised if ChaCha8 gets anywhere near that fast.
I don't know what the clock speed is on those machines, but standard ChaCha8 on Zen 4 runs at <= 0.35 cycles per byte when you generate a lot of output at once. I don't know if that's using AVX2 or AVX-512. The slightly tweaked chacha8rand cuts out some unnecessary work without affecting security and my not very optimized AVX2 implementation (1024 byte buffer) achieves ca. 0.5 cycles per byte on a Skylake CPU and could likely go even faster with AVX-512. Other people report similar figures for plain ChaCha8 implemented with AVX2. I would consider that close enough.
AVX2
Can we rely on the standard library being built with AVX2?
when you generate a lot of output at once
What about when generating a lot of small output? I would naively believe many users would use many short outputs rather than large chunks at a time. Sure, std can of course buffer the output but then that raises the complexity of storing a reasonably sized buffer somewhere and managing it.
AVX2
Can we rely on the standard library being built with AVX2?
No mainstream x86 target enables it by default. Runtime feature detection in core (https://github.com/rust-lang/rfcs/pull/3469) would enable using AVX2 anyway. But ChaCha8 with 128b SIMD is already pretty fast so wider SIMD isn’t very important unless you need multiple GB/s of random data (the vast majority of programs don’t).
What about when generating a lot of small output? I would naively believe many users would use many short outputs rather than large chunks at a time. Sure, std can of course buffer the output but then that raises the complexity of storing a reasonably sized buffer somewhere and managing it.
Any design based on ChaCha or a block cipher will want some buffering because the minimum amount of data you can generate at once is several times larger than the four or eight bytes that’s often requested.
Having a buffer is different from statistical RNGs that generate one u32 or u64 at a time and technically some extra state, but after implementing it myself I think it’s actually a better fit for the gen_bytes
style interface. Without next_uN
style methods in the mix, you can just store the buffer as a [u8; N]
field and treat all read sizes uniformly: while more bytes are needed, refill buffer if empty, then copy min(requested, available) bytes into dest and update lengths/pointers. No need to think about reads that aren’t a multiple of your word size, fix up byte order if you care about consistent results across platforms, and no temptation to add a different code path for large reads where you might want loop unrolling or SIMD even for a “word at a time” generator.
Note that there's a tracking issue, we should continue the discussion there: https://github.com/rust-lang/rust/issues/131606
API design partially based on discussions with @BartMassey. Revised based on feedback, in particular from @Amanieu and @joboet.
Proposal
Problem statement
People regularly want to generate random numbers for a variety of use cases. Doing so currently requires using a crate from
crates.io
.rand
is the 6th most downloaded crate oncrates.io
, andfastrand
is quite popular as well. Many, many people over the years have expressed frustration that random number generation is not available in the standard library (despite the standard library using randomness internally forHashMap
).There are multiple reasons why we have not historically added this capability. Primarily, there are three capabilities people want, and those capabilities seem to present a "pick any two" constraint:
These constraints arise from the possibility of a secure random number generator potentially requiring updates for security reasons. Changing the random number generator would result in different sequences for the same seed.
In addition to that primary constraint, there have also been design difficulties: there are numerous pieces of additional functionality people may want surrounding random number generation, which makes any proposal for it subject to massive scope creep and bikeshed painting. Most notably: users of random numbers may want to represent the state of the RNG explicitly as something they can pass around, or implicitly as global state for simplicity.
This ACP proposes a solution that aims to be as simple as possible, satisfy all the stated constraints on the problem, and allow for future expansion if desired. This ACP handles the "pick any two" constraint above by providing a seedable random source that is explicitly identified as insecure. This will allow us to keep the insecure seedable generator the same across Rust versions and targets.
This ACP builds on the accepted ACP 393, which added an RNG that is secure but does not guarantee identical output across Rust versions.
Motivating examples or use cases
Solution sketch
This builds upon the APIs added in ACP 393, and only specifies the new APIs.
This would live in the
core::random
module.This ACP proposes using a simple implementation of a ChaCha-based RNG.
The seeded insecure random number generator, given the same seed, will provide the same sequence of random numbers, on all targets.
Alternatives
We could avoid providing seeded random number generation at all, and refer people who need seeded random number generation to external crates.
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: