joshtriplett commented 5 months ago

API design partially based on discussions with @BartMassey. Revised based on feedback, in particular from @Amanieu and @joboet.

Proposal

Problem statement

People regularly want to generate random numbers for a variety of use cases. Doing so currently requires using a crate from crates.io. rand is the 6th most downloaded crate on crates.io, and fastrand is quite popular as well. Many, many people over the years have expressed frustration that random number generation is not available in the standard library (despite the standard library using randomness internally for HashMap).

There are multiple reasons why we have not historically added this capability. Primarily, there are three capabilities people want, and those capabilities seem to present a "pick any two" constraint:

Secure random number generation
Seedable random number generation
Stability between versions of Rust when seeded

These constraints arise from the possibility of a secure random number generator potentially requiring updates for security reasons. Changing the random number generator would result in different sequences for the same seed.

In addition to that primary constraint, there have also been design difficulties: there are numerous pieces of additional functionality people may want surrounding random number generation, which makes any proposal for it subject to massive scope creep and bikeshed painting. Most notably: users of random numbers may want to represent the state of the RNG explicitly as something they can pass around, or implicitly as global state for simplicity.

This ACP proposes a solution that aims to be as simple as possible, satisfy all the stated constraints on the problem, and allow for future expansion if desired. This ACP handles the "pick any two" constraint above by providing a seedable random source that is explicitly identified as insecure. This will allow us to keep the insecure seedable generator the same across Rust versions and targets.

This ACP builds on the accepted ACP 393, which added an RNG that is secure but does not guarantee identical output across Rust versions.

Motivating examples or use cases

Generating randomness for games (e.g. rolling dice).
- Allowing "replay" of such games by providing a seed
Randomly shuffling a list
Randomly sampling elements from a set
Random test case generation / fuzz testing
- Allowing reproduction of such test cases by providing a seed

Solution sketch

This builds upon the APIs added in ACP 393, and only specifies the new APIs.

This would live in the core::random module.

/// A seeded, insecure random number generator.
///
/// This source uses the ChaCha CSPRNG, but does not claim to be hardened for security; for secure
/// randomness use `DefaultRandomSource` or another `RandomSource`.
///
/// This source does not attempt to protect against `fork`, VM forks, or similar; if the process or
/// VM forks, this source will return the same series of outputs in both forks.
pub struct DeterministicRandomSource { /* no public fields */ }

impl DeterministicRandomSource {
    /// Create a seeded random source from the specified seed.
    ///
    /// Two instances of this source created from the same seed will produce the same series of
    /// outputs, even on different Rust versions or different targets.
    ///
    /// This generator makes no promises about security, and will compromise security for determinism
    /// if the two come into conflict. Do not use this for secure randomness.
    pub fn from_seed(seed: [u8; 32]) -> Self { /* ... */ }

    /// Get the original seed of this random source.
    // Unresolved question: does the generator need to store the seed for purposes other than this
    // method? If not, do we want this, or should people have to store it separately?
    pub fn seed(&self) -> [u8; 32] { /* ... */ }
}

impl RandomSource for DeterministicRandomSource { /* ... */ }

// Note that the Debug impl does not expose the seed or internal state.
impl Debug for DeterministicRandomSource { /* ... */ }

// A cloned copy of a random source will return the same series of outputs.
//
// The `DeterministicRandomSource` does not implement `Copy`, so that producing two sources with the
// same outputs requires an explicit clone.
impl Clone for DeterministicRandomSource { /* ... */ }

This ACP proposes using a simple implementation of a ChaCha-based RNG.

The seeded insecure random number generator, given the same seed, will provide the same sequence of random numbers, on all targets.

Alternatives

We could avoid providing seeded random number generation at all, and refer people who need seeded random number generation to external crates.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

joshtriplett commented 3 months ago

I've updated this proposal to build upon the now-accepted ACP 393, along with the changes proposed by @joboet.

Amanieu commented 3 months ago

I don't think "insecure" is the important part of these. In fact, it is downright misleading since ChaCha is a CSPRNG. The key property of this RNG is that it is seedable and once seeded is guaranteed to always return the same results. A better name would be SeedableRandomSource.

joshtriplett commented 3 months ago

@Amanieu

Given sufficient confidence in ChaCha as an algorithm, it's certainly possible for us to create an implementation that 1) implements all the necessary protections a secure implementation might want (e.g. protection against fork, protection against VM fork, memory locking, protection keys where available), 2) commits to fixing those if there's an issue with them, 3) commits to addressing future scenarios we haven't thought of yet, and 4) still provides the same outputs in future versions.

However, it's much much easier and more maintainable for the uses requiring secure randomness to call getrandom or equivalent (via DefaultRandomSource), and delegate secure randomness to the kernel and/or hardware. That means we don't have to take on that maintenance burden ourselves.

The point of this proposal is to have a "good enough" RNG that doesn't make some of those promises, and is easy to maintain, and provides seedability.

If people really want to name this ChaChaRandomSource we can. But I still think it's important to disclaim any potential claims about security that people might otherwise assume, and point people towards DefaultRandomSource for secure randomness. (I've expanded the doc comments in this proposal accordingly.)

pitaj commented 3 months ago

Is my understanding that this would fulfill the Math.random() use case?

the8472 commented 3 months ago

I don't think "insecure" is the important part of these. In fact, it is downright misleading since ChaCha is a CSPRNG.

So was RC4 in the distant past. The issue is that no practical cryptographic algorithm rests on proofs that the problem is outside P, and how could they, since P!=NP has not been proven yet. And many symmetric algorithms do not rest on mathematical proofs at all.

So even if we assume today that they're secure for practical purposes, we cannot promise so for the future. This is why josh is correct about the "pick any two" aspect.

If we promise seedability and stability then we should be prepared for the possibility that it one day will be considered insecure. At which point we would not want to be in a situation where users rely on it for security purposes, which means actively discouraging its use for such purposes.

CAD97 commented 2 months ago

Allowing "replay" of such games by providing a seed

I just want to note that a PRNG is likely a poor choice for this specific use case; noise based RNG (link: GDC talk) is what is actually appropriate. Specifically, PRNGs are limited to advancing their state forward. RNG derived from chaotic integer noise (essentially a hash function) on the other hand is random access, allowing replays to be jumped around without burning CPU time spinning the PRNG.

Of course other constraints might still prevent random seeks, and “keyframe” style solutions to those can be applied to the PRNG as well, but (since this is a notable use case for using deterministic RNG) this still bears mentioning.

the8472 commented 2 months ago

Not all RNGs are trapdoors, e.g. the ChaCha keystream is seekable and can double as an RNG. Block ciphers in counter mode would do too. It's mostly dedicated cryptographic RNG constructions that are trapdoors to provide forward secrecy, which isn't relevant for insecure ones.

Though you'd still need an API for seeking.

joshtriplett commented 1 month ago

@Amanieu Changed to DeterministicRandomSource, and added the unresolved question on get_seed.

Amanieu commented 1 month ago

We discussed this in the libs-api meeting today and are happy to accept this, with the get_seed issue to be resolved in the implementation PR.

joboet commented 1 month ago

I've started implementing this and wondered whether using ChaCha20 might be a little overkill. This generator should never be used for anything remotely security-sensitive (unless we want to guarantee resistance against timing-attacks, which I'm quite opposed to), so the computational complexity of ChaCha doesn't actually give us anything. Wouldn't a simpler RNG suffice, as long as it yields good-quality output?

programmerjake commented 1 month ago

you could use ChaCha8, which is ChaCha20 but with a lot less rounds: https://en.wikipedia.org/wiki/Salsa20#Reduced-round_ChaCha

programmerjake commented 1 month ago

another option is using AES-128 in CTR mode, which should be quite fast for the processors that include AES acceleration instructions. it's basically:

struct State(u128, u128);
fn aes_128(key: u128, input: u128) -> u128;
impl State {
    fn next(&mut self) -> u128 {
        let retval = aes_128(self.0, self.1);
        self.1 += 1;
        retval
    }
}

it has the added benefit that it's really easy to seek to any location in the output allowing you to generate output in parallel by copying the state to multiple threads and seeking each state to a different position.

joshtriplett commented 1 month ago

@joboet wrote:

I've started implementing this and wondered whether using ChaCha20 might be a little overkill. This generator should never be used for anything remotely security-sensitive (unless we want to guarantee resistance against timing-attacks, which I'm quite opposed to), so the computational complexity of ChaCha doesn't actually give us anything. Wouldn't a simpler RNG suffice, as long as it yields good-quality output?

That seems reasonable to me.

@programmerjake wrote:

you could use ChaCha8, which is ChaCha20 but with a lot less rounds

That sounds completely reasonable for the insecure generator.

another option is using AES-128 in CTR mode, which should be quite fast for the processors that include AES acceleration instructions

But probably slower for those that don't, and the generator should produce the same results on every target. AFAICT ChaCha-based algorithms will be substantially faster on anything without hardware AES support, and reasonably fast everywhere in any case.

programmerjake commented 1 month ago

something that could be even better than ChaCha8 or AES-128 in CTR mode is to use a counter-based random number generator such as the Squares RNG -- it has the benefits of being very fast, having high-quality output (though I'm assuming not cryptographic quality), and being O(1) seekable which makes parallelization much easier.

joshtriplett commented 1 month ago

@programmerjake I'm going to leave a call like that to cryptographers like @traviscross. My understanding is that ChaCha has had more scrutiny.

programmerjake commented 1 month ago

@programmerjake I'm going to leave a call like that to cryptographers like @traviscross. My understanding is that ChaCha has had more scrutiny.

yes, but if it's specifically insecure, why do you need cryptographic properties? I recognize that that means you have better randomness, but if it's good enough for everything but cryptography, much faster, and O(1) seekable (seekability is necessary for some applications), then imo that means it's good enough for std's insecure RNG.

joboet commented 1 month ago

Counter-based RNGs seem like a good option, they should allow more instruction-level parallelism. My only worry with Squares is that its quality depends on the key:

The key should be an irregular bit pattern with roughly half ones and half zeros.

so when users do something like DeterministicRandomSource::from_seed(0), they'll get unusable data. This constrains the number of usable keys, it's no longer really $2^64$. So perhaps some other CBRNG like Threefry or Philox would be better suited?

the8472 commented 1 month ago

Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.

As long as it's unstable we can still change the algorithm.

Going with a reduced-round version is probably a good idea though to send the right message about security.

programmerjake commented 1 month ago

My only worry with Squares is that its quality depends on the key:

The key should be an irregular bit pattern with roughly half ones and half zeros.

so when users do something like DeterministicRandomSource::from_seed(0), they'll get unusable data.

Well, you could have the seed be used as the counter value instead of the key value (though that has problems that nearby seeds generate similar random sequences, just shifted in time), or could have the key be a good hash of the seed, where good means having the Avalanche effect. Maybe SipHash since we already have that in std and seeding the RNG is less performance critical than generating numbers or seeking?

programmerjake commented 1 month ago

Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.

except that iirc ~~chacha is not seekable~~ (edit: chacha is seekable), which i think is an important property that our RNG should have.

programmerjake commented 1 month ago

Before worrying too much we should actually measure how many nanoseconds it takes to get a few bytes from chacha.

for comparison against squares-rng, for the squares32 variant I got compiler explorer to generate 1GiB of random output in 0.24s on one thread with clang -march=znver4 -O3 (you have to refresh a few times to get the AVX512 capable runner): https://gcc.godbolt.org/z/9xTYKn4bo (edit: rerunning again it got 0.19s)

I will be very surprised if ChaCha8 gets anywhere near that fast.

juntyr commented 1 month ago

I have used counter-based PRNGs in research and think they would be incredibly useful functionality-wise for an insecure seeded and reproducible PRNG:

reseeding, jumping, and forking are all trivial to support, we were in no way limiting the operations one can perform on the PRNG
the inner state, the counter, is just an integer that can be exposed as a public field or with setter methods
we can choose any hash function with good avalanching

hanna-kruppe commented 1 month ago

I will be very surprised if ChaCha8 gets anywhere near that fast.

I don't know what the clock speed is on those machines, but standard ChaCha8 on Zen 4 runs at <= 0.35 cycles per byte when you generate a lot of output at once. I don't know if that's using AVX2 or AVX-512. The slightly tweaked chacha8rand cuts out some unnecessary work without affecting security and my not very optimized AVX2 implementation (1024 byte buffer) achieves ca. 0.5 cycles per byte on a Skylake CPU and could likely go even faster with AVX-512. Other people report similar figures for plain ChaCha8 implemented with AVX2. I would consider that close enough.

ChrisDenton commented 1 month ago

AVX2

Can we rely on the standard library being built with AVX2?

when you generate a lot of output at once

What about when generating a lot of small output? I would naively believe many users would use many short outputs rather than large chunks at a time. Sure, std can of course buffer the output but then that raises the complexity of storing a reasonably sized buffer somewhere and managing it.

hanna-kruppe commented 1 month ago

AVX2

Can we rely on the standard library being built with AVX2?

No mainstream x86 target enables it by default. Runtime feature detection in core (https://github.com/rust-lang/rfcs/pull/3469) would enable using AVX2 anyway. But ChaCha8 with 128b SIMD is already pretty fast so wider SIMD isn’t very important unless you need multiple GB/s of random data (the vast majority of programs don’t).

What about when generating a lot of small output? I would naively believe many users would use many short outputs rather than large chunks at a time. Sure, std can of course buffer the output but then that raises the complexity of storing a reasonably sized buffer somewhere and managing it.

Any design based on ChaCha or a block cipher will want some buffering because the minimum amount of data you can generate at once is several times larger than the four or eight bytes that’s often requested.

Having a buffer is different from statistical RNGs that generate one u32 or u64 at a time and technically some extra state, but after implementing it myself I think it’s actually a better fit for the gen_bytes style interface. Without next_uN style methods in the mix, you can just store the buffer as a [u8; N] field and treat all read sizes uniformly: while more bytes are needed, refill buffer if empty, then copy min(requested, available) bytes into dest and update lengths/pointers. No need to think about reads that aren’t a multiple of your word size, fix up byte order if you care about consistent results across platforms, and no temptation to add a different code path for large reads where you might want loop unrolling or SIMD even for a “word at a time” generator.

the8472 commented 1 month ago

Note that there's a tracking issue, we should continue the discussion there: https://github.com/rust-lang/rust/issues/131606

rust-lang / libs-team

Simple seedable insecure random number generation, stable across Rust versions #394

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

What happens now?

Possible responses