Closed joshtriplett closed 5 days ago
impl Send for Rng; impl !Sync for Rng;
why can't Rng
be Sync
? if all methods that modify state take &mut self
then being Sync
should be fine
So the Rng
type is only meant for the insecure generation and the free function is meant for secure generation?
That means there's no common abstraction for them if one wants to use a seedable RNG in tests and a real one in production.
So the
Rng
type is only meant for the insecure generation and the free function is meant for secure generation? That means there's no common abstraction for them if one wants to use a seedable RNG in tests and a real one in production.
I put that in future work: it's possible we might want to have a common type, or alternatively a common trait and two different types (so that the type system ensures you don't use an insecure RNG where you wanted a secure one). But that is likely to require further design, and I would like to leave that out of the initial design to keep the initial design simple. Past efforts to add an RNG have run into endless design issues and tradeoffs. The smaller the surface area, the fewer of those issues come up.
impl Send for Rng; impl !Sync for Rng;
why can't
Rng
beSync
? if all methods that modify state take&mut self
then beingSync
should be fine
You're entirely right; fixed.
Do we need a more descriptive name than Rng
? I'm nervous about claiming such a generic name for the insecure version. And if this is to be stable for all time then I think that the rng algorithm becomes an API question.
Also shouldn't this have a way to fill a buffer using the system crng? It's less convenient but it's the most basic building block that platforms provide so it may be worth having a cross-platform version exposed.
Also shouldn't this have a way to fill a buffer using the system crng? It's less convenient but it's the most basic building block that platforms provide so it may be worth having a cross-platform version exposed.
I don't think we should make any guarantees that something is using the "system" RNG, just that it's using some cryptographically secure RNG.
I do agree that "fill this buffer with randomness" is a useful building block, but the goal of this ACP was the simple end-user-targeted interface, since existing libraries already have those building blocks.
And if this is to be stable for all time then I think that the rng algorithm becomes an API question.
The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.
And if this is to be stable for all time then I think that the rng algorithm becomes an API question.
The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.
Sure, picking an algorithm for the insecure version is what I mean. That would need to be part of the proposal as it's part of the API, no?
I am in favor of the general idea here.
A domain question for folks using secure RNG: how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed? I think the API as proposed would be insufficient for that because there's no way to create a secure RNG with a seed.
I have some initial thoughts:
(This would require deciding whether to have type-level separation between secure and seeded Rng states.)
I kinda lean towards this in the affirmative. That is, that we should have type-level separation. The use cases are different enough that being able to say "this type is always going to do 'secure' rng" is likely quite valuable. I'm not sure we need to commit to this in the initial design, but I think we should leave room for it. For example, perhaps by renaming Rng
to InsecureRng
or whatever.
Stability between versions of Rust when seeded
This is a little concerning to me because it feels like a very strong guarantee that will lock us into something for eternity. But, I confess, I am a bit ignorant here, and perhaps this isn't as big of a deal as I think it is. Do other language ecosystems guarantee stability of seeded insecure RNGs?
how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed?
With a pluggable RNG implementing a trait. In rand
Rng is a trait
I've asked this in https://github.com/rust-lang/libs-team/issues/393#issuecomment-2160380135 and here the reply https://github.com/rust-lang/libs-team/issues/393#issuecomment-2160388270
@ChrisDenton wrote:
And if this is to be stable for all time then I think that the rng algorithm becomes an API question.
The intent is that the seeded, insecure version is an API guarantee. The secure version makes no guarantees about algorithm, but the seeded version does.
Sure, picking an algorithm for the insecure version is what I mean. That would need to be part of the proposal as it's part of the API, no?
You're right, I should spell that out more explicitly. My intention was that we initially use the same RNG implementation for both, but if any security issue ever arises that requires us to change the RNG implementation, we'll change the secure one and leave the insecure one.
@BurntSushi wrote:
A domain question for folks using secure RNG: how do you write tests for the behavior of routines that make use of the secure RNG? Do you do it by providing it with a seed? I think the API as proposed would be insufficient for that because there's no way to create a secure RNG with a seed.
As @the8472 wrote, likely by using a trait.
I have some initial thoughts:
(This would require deciding whether to have type-level separation between secure and seeded Rng states.)
I kinda lean towards this in the affirmative. That is, that we should have type-level separation. The use cases are different enough that being able to say "this type is always going to do 'secure' rng" is likely quite valuable. I'm not sure we need to commit to this in the initial design, but I think we should leave room for it. For example, perhaps by renaming
Rng
toInsecureRng
or whatever.
I think I'm convinced by this, yeah: let's keep the types distinct, and then we can provide a trait for any code that wants to be generic over the RNG. But I still think this should be future work, to avoid increasing the initial surface area we need to get consensus on.
Stability between versions of Rust when seeded
This is a little concerning to me because it feels like a very strong guarantee that will lock us into something for eternity. But, I confess, I am a bit ignorant here, and perhaps this isn't as big of a deal as I think it is. Do other language ecosystems guarantee stability of seeded insecure RNGs?
To the extent other languages provide stability guarantees, yes, some other languages/libraries do provide seeded RNGs and guarantee that the same seed produces the same sequence of values. For instance, Python provides an insecure seedable RNG in the random
module, and separately provides secure random number generation in the secrets
module which does not support seeding.
Should the random
function take a range, or is that left to future additions? Generating a uniform random value within a range is important especially for secure rng because otherwise people might use %
(introducing bias).
To me it feels like we are trying to go too high level too quickly, especially with the Random
trait that opens up all sorts of questions about ranges and distributions support. And even how slices would work with it. Wouldn't it make sense to cut this down (for now) to just upstream the getrandom
crate, or in other words the following API for just filling a buffer with cryptographically secure randomness?
pub fn getrandom(dest: &mut [u8]) -> Result<(), Error>;
and / or:
pub fn getrandom_uninit(dest: &mut [MaybeUninit<u8>]) -> Result<&mut [u8], Error>;
It feels like a big mistake not to support slices, as you would otherwise need to create a loop to fill the buffer, causing a huge syscall overhead that isn't necessary, because the underlying OS primitives all (from what I'm seeing) support buffers. In fact std internally already has its own getrandom
abstraction (imp::fill_bytes
) that it uses for secure randomness for all the platforms. Together with the rand
crate being built on top of the getrandom
crate this highlights that this is really the core cross platform abstraction that you really need in the end. Everything else is just nice to have on top.
I'm certainly not opposed to having more higher level APIs available in the initial API as well, but a getrandom
equivalent is imo an absolute must.
@pitaj wrote:
Should the
random
function take a range, or is that left to future additions? Generating a uniform random value within a range is important especially for secure rng because otherwise people might use%
(introducing bias).
That's in the future work section. I absolutely think we should, but I'm trying to minimize the initial surface area.
To me it feels like we are trying to go too high level to quickly, especially with the
Random
trait that opens up all sorts of questions about ranges and distributions support. Wouldn't it make sense to cut this down (for now) to just upstream thegetrandom
crate, or in other words the following API for just filling a buffer with cryptographically secure randomness?pub fn getrandom(dest: &mut [u8]) -> Result<(), Error>;
That solves a completely different problem. That's a building block that would live underneath the higher-level interface most people are actually looking for, and it wouldn't solve the problem most people are pulling in crates for. An order of magnitude more crates depend on rand
than on getrandom
.
It feels like a big mistake not to support slices
The initial RFC does support arrays: you can do let array: [u8; 4096] = random();
. And we can implement that reasonably efficiently.
Will the stable RNG be stable across different targets? (especially bit-size and endianness)
@cuviper It should be, yes. I'll document that.
@rust-lang/libs-api I've now split this ACP.
This ACP provides only the simple secure random number generation. It has a mention, in the "Future work" section, of a possible "phase 2" design for a seedable secure RNG.
The separate ACP #394 provides the seedable insecure random number generation that's stable across Rust versions.
as a proposed algorithm, I think https://en.wikipedia.org/wiki/Fortuna_(PRNG) could be good. it would collect entropy from rdrand
-style instructions (if available, with a way to disable using them since some people think they're backdoored) and from the OS. imo rdrand, even if backdoored, is just going to make the RNG more random, since they're not the only entropy source.
it might be nice to also have an api for adding more entropy (no actual entropy estimation needed, which I think is one of the benefits of Fortuna).
What led you to pick this working on integers & bool, rather than exposing something more raw? For example, one basic version of this would be to offer an impl Read
that let you fill [u8]
s, and that would seem helpful for things like wasm targets as it would be able to return io::Error
s, since there's often no secure randomness available.
For the implementation side, do we really need to maintain something in std? Can we just be a thin layer around CryptGenRandom
//dev/urandom
/whatever? Obviously something user-mode with a bunch of state could be faster, but it's also bigger, so maybe it's fine to say "look, this will give you the basic entropy access, which you can use directly if you have simple needs, but if you want more complicated things or different performance trade-offs, get a crate". I'd much rather delegate the "secure" part to the OS if possible (that can be patched outside what we do), since getting a fresh "yup we're secure" audit for a standard library implementation every release seems impractical. Or, said from the other direction, if I care about security being able to pin a crate version across rust releases sounds like a good thing.
The solution sketch here mentions "thread-local" in the documentation for the function. Is that observable at all? Is there a way that the caller could tell anything different between it being thread-local or global and mutexed?
relying on the OS and not implementing your own CSRNG may be best, plus some OSes go to great lengths to make getting randomness fast, e.g. https://www.phoronix.com/news/Linux-Random-vDSO-2024
@scottmcm wrote:
What led you to pick this working on integers & bool, rather than exposing something more raw? For example, one basic version of this would be to offer an impl Read that let you fill [u8]s,
This interface also allows writing let buf: [u8; 1024] = random();
if you'd like. ("Fill an existing buffer" is in the future work section.) The reason I'm proposing something based on trait Random
is to provide usability and support a variety of potential types. (The Random
trait is intentionally opaque for now.)
I don't think providing exclusively an interface for filling buffers will serve many potential users; with such an interface, most users would end up having to wrap it for usability, at which point they're using an external crate anyway, so we didn't solve the underlying problem of "why do I need an external crate just to get a random number?". I think we need
and that would seem helpful for things like wasm targets as it would be able to return io::Errors, since there's often no secure randomness available.
I would argue that most applications with a dependency on secure randomness are not prepared to do without it, and panicking is the appropriate response. We could add fallible interfaces like rand
's try_fill
in the future, but in practice I think those will see substantially less use.
For the implementation side, do we really need to maintain something in std? Can we just be a thin layer around
CryptGenRandom
//dev/urandom
/whatever?
We can use any implementation we want. That includes directly using OS randomness (e.g. on a target that provides it via something like the Linux VDSO) if available, which has multiple advantages, including support for unusual things like vmfork operations (forking a virtual machine and reseeding randomness). This ACP is not attempting to commit to any particular implementation strategy. I do expect, in practice, that there may exist some targets for which we need to use a seeded CSPRNG, because we can get small amounts of randomness but not large amounts.
The solution sketch here mentions "thread-local" in the documentation for the function. Is that observable at all? Is there a way that the caller could tell anything different between it being thread-local or global and mutexed?
I rewrote the doc comment. For the secure RNG, there's no way to tell the difference. In practice, I don't think we should use global-and-mutexed for performance reasons, but that's an issue of implementation quality, not something this ACP is trying to mandate.
(The difference would only arise for seedable RNGs, which this ACP is not proposing.)
I'm in favour of a high level API but if we then have two APIs that use an OS entropy source (Rng and RandomState) then it becomes increasingly silly that we don't expose direct access to it. So it'd be good to have a raw entropy_source
API.
@ChrisDenton To be clear, I'm not objecting to exposing an interface for "fill this array with random bytes" or similar. I'd love to see us add that interface. I just don't think it's the very next interface we should add.
I would argue that most applications with a dependency on secure randomness are not prepared to do without it, and panicking is the appropriate response.
I agree with that position, but I also recall people saying they're fine with stubbing things in std
that return Result
, but not stubbing things by panicking in things that don't return results. So I don't know how to deconflict that.
I don't think providing exclusively an interface for filling buffers will serve many potential users; with such an interface, most users would end up having to wrap it for usability, at which point they're using an external crate anyway, so we didn't solve the underlying problem of "why do I need an external crate just to get a random number?".
The piece it would solve is problems like "wait, why does cargo geiger say I need a crate using unsafe just for random numbers?", though.
In some ways, I think the "well I need to wrap that" could be considered a good thing, because due to things like
Providing a correct implementation will steer people away from the most common open-coded implementation (using %), which introduces bias.
that you mention above, the answer with this ACP would still be "you should get a crate for random numbers".
If the goal is to change the answer on URLO/discord from "use the rand crate", then I think std absolutely needs all the things like https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution so you can do things properly with std.
I mention the filling buffers version because it gives the answer of "well, you can do it yourself with fill_buf
+from_ne_bytes
if you really want, but you probably want the rand
crate's wrappers to make it nice", which seems plausibly useful without needing to figure out the whole distribution and state APIs.
I agree with that position, but I also recall people saying they're fine with stubbing things in
std
that returnResult
, but not stubbing things by panicking in things that don't return results. So I don't know how to deconflict that.
what about having a cfg for that? it could be kinda like cfg(target_has_atomic = "8")
except for library optional features.
e.g.:
#[cfg(target_has_library_feature = "secure_random")]
pub fn generate_rsa_key() -> RsaKey {
let mut primes;
loop {
primes = random();
if is_valid_rsa_primes(primes) {
break;
}
}
RsaKey::from_unchecked(primes)
}
I've revised this again based on feedback from the last review, in collaboration with @Amanieu.
This now defines concrete non-sealed Random
and RandomSource
traits, which other crates can implement. This allows the ecosystem to extend this, and we can leave more esoteric needs to the ecosystem.
pub trait Random { fn random(random_source: &mut impl RandomSource) -> Self; }
let's not repeat the same mistake of std::hash::Hash
, please make it
pub trait Random {
fn random(random_source: &mut (impl RandomSource + ?Sized)) -> Self;
// or `fn random<S: RandomSource + ?Sized>(random_source: &mut S) -> Self`
}
pub fn random<R: Random>() -> R { <R as Random>::random(DefaultRng) }
I suppose you mean <R as Random>::random(&mut DefaultRng)
here?
@kennytm: Good call on both counts. Fixed.
LGTM! Nice work @joshtriplett and @Amanieu!
The proposal still talks quite a bit about eliminating thread local state in alternatives - but if I understand correctly, there's no proposed caching or other thread local state in the design? Essentially, we're committing to a global cache of the discovered default source of randomness or not caching it at all?
@Mark-Simulacrum I've fixed the couple of references to "thread-local" to be more general; the point of that mention in the alternatives section was that we could drop random()
and require people to pass DefaultRng
explicitly, but that'd be inconvenient for common cases.
The implementation of DefaultRng
has complete freedom to use any mechanism internally. It could use a thread-local CSPRNG, or a Mutex-guarded CSPRNG, or read random bytes from the OS. I would expect the choice of implementation to depend on the target. If a target's OS-provided randomness is already sufficiently fast (e.g. a VDSO-accelerated getrandom
on Linux), we could call it directly; if not, we could use a thread-local CSPRNG seeded by OS randomness.
The question is also what's the right point to panic/Result.
With CSPRNGs the right point to error on initialization. Afterwards they should be infallible. OS-RNGs on the other hand can fail on every call.
We discussed this in the @rust-lang/libs-api meeting today and we are happy to accept this as an experimental API.
API design partially based on discussions with @BartMassey. Revised based on feedback, in particular from @Amanieu.
Proposal
Problem statement
People regularly want to generate random numbers for a variety of use cases. Doing so currently requires using a crate from
crates.io
.rand
is the 6th most downloaded crate oncrates.io
, andfastrand
is quite popular as well. Many, many people over the years have expressed frustration that random number generation is not available in the standard library (despite the standard library using randomness internally forHashMap
).There are multiple reasons why we have not historically added this capability. Primarily, there are three capabilities people want, and those capabilities seem to present a "pick any two" constraint:
These constraints arise from the possibility of a secure random number generator potentially requiring updates for security reasons. Changing the random number generator would result in different sequences for the same seed.
In addition to that primary constraint, there have also been design difficulties: there are numerous pieces of additional functionality people may want surrounding random number generation, which makes any proposal for it subject to massive scope creep and bikeshed painting. Most notably: users of random numbers may want to represent the state of the RNG explicitly as something they can pass around, or implicitly as global state for simplicity.
This ACP proposes a solution that aims to be as simple as possible, satisfy all the stated constraints on the problem, and allow for future expansion if desired. This ACP proposes a generator that is secure, and potentially seedable in the future, but not stable across versions of Rust; this allows us to update the secure RNG if any issue or potential improvement arises.
Separately, ACP 394 proposes an RNG that is seedable and guarantees identical seeding across Rust versions, but is not a secure RNG.
Motivating examples or use cases
Solution sketch
The trait
Random
will initially be implemented for all iN and uN integer types,isize
/usize
, andbool
, as well as arrays and tuples of such values.Notably,
Random
will not initially be implemented for floating-point values (to initially avoid questions of distribution), strings/paths/etc (to avoid questions of length and character selection), orchar
(to avoid questions of what subset of values to generate). We can consider such additions in the future; see the section on future work below.The random number generator may use OS randomness directly, if available, or use a CSPRNG seeded from OS/hardware randomness.
Alternatives
We could do nothing, and continue to refer people to external crates like
rand
andfastrand
.We could eliminate the convenience functions that implicitly use
DefaultRng
, and require all users to pass around a RandomSource directly. However, this would add complexity to many simple use cases.We could allow
gen_bytes
to fill an uninitialized buffer. This would be a more complex interface, however. We could potentially introduce such an interface later, when we've stabilized the types to make it simpler.We could use
Read
to get data from random sources (e.g.trait RandomSource: Read
). This would have the advantage of letting us useread_buf
once we stabilize that. However, this would prevent us from puttingRandomSource
incore
.We could allow
gen_bytes
to fail and return aResult
. However, it seems unlikely that most consumers of randomness would be able to handle not having access to it. The higher-level functions will almost certainly want to panic rather than returning a result, for convenience of invocation, and having callers who can deal with an absence of randomness call the low-level interface directly doesn't seem like a sufficiently useful interface to support. We also don't want to introduce a whole family of paralleltry_random
/try_random_range
/etc functions. We could always introduce atry_gen_bytes
interface later, if there's a need for one. (In addition, we would not want to useio::Result
here, as that would preventRandomSource
from living incore
.)We could rename
Random::random
toRandom::random_with
or similar, and userandom
for a function that callsrandom_with(DefaultRng)
. This would make it convenient to call, for instance,u32::random()
. This doesn't seem like an important convenience, however, as it's just as easy to callrandom::<u32>()
(or in most contexts justrandom()
with type inference).Future work
We should support randomly shuffling arrays:
If we need it for performance, we could add functions like
gen_u64
orgen_u32
toRandomSource
(with default implementations based ongen_bytes
), to help RNGs that can implement those more efficiently than they can implementgen_bytes
. This is inspired byHasher
doing something similar. I propose that we initially have justgen_bytes
, and require benchmarks demonstrating a substantial speedup before we consider the additional complexity and interface surface area.We should support random generation in ranges (e.g.
random_range(1..6)
). Providing a correct implementation will steer people away from the most common open-coded implementation (using%
), which introduces bias.We could support choosing a random item from an iterator. (This can be done more optimally when the iterator implements some additional traits.)
We can implement
Random
for many more types, such asNonZero<T>
andWrapping<T>
.We should support
derive(Random)
for types whose fields all implementRandom
. This is relatively straightforward for structs. However, supporting this for enums would require some additional care to handle discriminants: should a randomOption<u8>
be 50%None
, or 1/257None
? The latter is much more difficult to implement correctly.We could add additional
RandomSource
implementations to the standard library, including seeded sources. This would enable purposes such as testing, reproducing the creation of objects whose algorithms require randomness to generate, replaying processes such as fuzz testing, supporting game seeds or replays, and various others. Note that this would not be the same type asDefaultRng
.We should not allow seeding the
DefaultRng
state, as that would affect all users rather than only affecting those that are opting into supporting seeding, and would also preclude designs that don't involve a seed (e.g. obtaining random numbers directly from the OS).We could consider, in the future, introducing random float generation, or random character generation, or generation of various other types. A careful design could allow using the same mechanisms for this as for random generation in ranges (e.g. a
Distribution<T>
trait to sample from). Alternatively, we could leave the full breadth of this to the ecosystem, and just support random ranges directly, as well as some common specific ways to generate floats and characters.We could provide a trait to fill an existing value rather than creating a new one.
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: