rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.61k stars 425 forks source link

async rand_core trait #1155

Closed newAM closed 5 days ago

newAM commented 2 years ago

Background

The embedded-hal crate provides traits for embedded hardware. In #291 embedded-hal removed their RNG trait in favor of rand_core. embedded-hal is now working on adding async traits using GATs in #285.

This issue is really more of a question than a feature request: Is rand_core the appropriate place to add async RNG traits? If there are use-cases for async RNG traits outside of embedded rust I think it would be a good idea to include this in rand_core, otherwise it would probably be more appropriate for embedded-hal.

What is your motivation?

Developing async on embedded targets has been progressing nicely, and with GAT stabilization on the way it will soon be possible to write #![no_std] friendly async traits on rust stable.

HW RNG peripherals are fast, but can still benefit from async methods that allow the CPU to yield to other tasks while waiting for the hardware to generate entropy. On the STM32WL (ARM Cortex-M4), generating a [u32; 4] block of entropy takes ~2662 CPU cycles on a cold-boot, and ~412 cycles when a steady state is reached. An async context switch takes a minimum of 62 cycles using the embassy executor.

What type of application is this?

Hardware entropy generation for embedded development on #![no_std] targets.

Feature request

Add an async RNG trait.

This would look something like this (prior art from embassy-rs):

use core::future::Future;

/// Random-number Generator
pub trait Rng {
    type Error;

    type RngFuture<'a>: Future<Output = Result<(), Self::Error>> + 'a
    where
        Self: 'a;

    /// Completely fill the provided buffer with random bytes.
    ///
    /// May result in delays if entropy is exhausted prior to completely
    /// filling the buffer. Upon completion, the buffer will be completely
    /// filled or an error will have been reported.
    fn fill_bytes<'a>(&'a mut self, dest: &'a mut [u8]) -> Self::RngFuture<'a>;
}
vks commented 2 years ago

I think it makes sense to add such a trait to rand_core, just to make interoperability easier.

There are some open questions about how this would interact with the existing traits. For instance: Can we implement Rng for all types that implement AsyncRng?

josephlr commented 2 years ago

One other question is if rand_core or rand should have any types that implement RngAsync. The only ones I can think of would be our various Rng adapters like:

It might actually make more sense have an async version of BlockRngCore. This abstraction would seem to better match the underlying hardware. It would also discourage using the (presumably slow) hardware RNG directing, and instead incentivizing use of it through a SeedableRng or ReseedingRng.

dhardy commented 2 years ago

This issue is really more of a question than a feature request: Is rand_core the appropriate place to add async RNG traits?

So, will there be direct interoperability between async and synchronous RNGs? If so this may make sense; if not a dedicated crate may be preferable(?).

Can we implement Rng for all types that implement AsyncRng?

How? Technically, yes, by spinning until poll is ready, but futures are usually waited on by an executor. But there is no standard executor and this is not a good place to be opinionated.

The reverse, implementing AsyncRng for every RngCore, would be easy, and perhaps makes more sense: users of RngCore will block until a result is yielded; users of AsyncRng can use their executor for concurrency.

Note that if we do this, a type cannot directly support both async and sync usage. But if we don't, an adapter is required to use a sync RNG in an async function; this is probably fine, thus it may be better not to have any auto impl.

It might actually make more sense have an async version of BlockRngCore

Is your point that ReseedingRng could still implement sync behaviour by requesting a fresh seed in a future which is polled on each request for bytes, only doing the actual reseeding once poll returns Ready? That might work (perhaps with some limit before it blocks, for security reasons).

Or is it simply that derived RNGs might implement both RngCore and AsyncRngCore depending on what their underlying RNG implements? Sure.


Another question: should getrandom support async usage? If so we can have AsyncOsRng (or OsRngAsync). But this doesn't need to be answered now.

newAM commented 2 years ago

Can we implement Rng for all types that implement AsyncRng?

The reverse, implementing AsyncRng for every RngCore, would be easy, and perhaps makes more sense: users of RngCore will block until a result is yielded; users of AsyncRng can use their executor for concurrency.

Note that if we do this, a type cannot directly support both async and sync usage. But if we don't, an adapter is required to use a sync RNG in an async function; this is probably fine, thus it may be better not to have any auto impl.

This is a design decision, but personally I would keep these separate. The choice to impl an async trait vs a sync trait should be representative of what the underlying hardware/code is doing.


It might actually make more sense have an async version of BlockRngCore. This abstraction would seem to better match the underlying hardware.

This would be a good fit for the hardware RNG I am currently working with. Hopefully other embedded users can comment on what the ideal trait would be.


It would also discourage using the (presumably slow) hardware RNG directing, and instead incentivizing use of it through a SeedableRng or ReseedingRng.

I should explain the cycle counts a bit more; the question asked in the embedded-rust matrix chat was if async RNG traits make sense at all. Polling is faster than async if the time to switch context is longer than polling hardware for completion. Based on the numbers I have available I do think that there are valid use-cases for async RNG traits; but they are definitely more specialized.

Sidetracking for a moment; the STM32WL hardware is quite fast as compared to software algorithms.

Source Cycles per [u32; 4]
ChaCha20Rng 2,875
ChaCha12Rng 1,764
ChaCha8Rng 1,216
STM32WL HW RNG 412

That being said there are still valid reasons to use software random number generation when hardware acceleration is available:

  1. Hardware RNGs can fail, whereas most(?) software RNG's are infallible after successfully seeding with hardware.
  2. Code portability
  3. Concurrent random number generation
dhardy commented 2 years ago

So, my current understanding from the above:

dhardy commented 5 days ago

I can add a fourth reason to the above list to prefer software RNGs: verifiability. Hardware RNGs are much harder to scrutinize than software RNGs, and we have seen failures in RDRAND.


To the main topic here: async RNGs. I have seen no interest in asynchronous random number distributions or shuffling/sampling, nor would this be simple, thus the topic appears to be limited to generation and perhaps caching.

This requires no interaction with the rest of rand, hence is best left to another crate.

It would be easy enough to adapt a synchronous RngCore RNG to operate through an asynchronous AsyncRng trait; this also does not need any new code within rand or rand_core (or rand_chacha etc.).

Presumably the interest in having an async variant of rand_core is to then implement its trait for various hardware RNGs. This would fit better under the purview of an embedded project, though I guess we could add a rust-random/rand-async repo for this (assuming someone else offers to maintain this).

In the mean-time, I'm closing this issue.