rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.67k stars 431 forks source link

Staggered reseeding #1357

Closed Pr0methean closed 12 months ago

Pr0methean commented 1 year ago

Background

This FR is designed for Monte Carlo simulations.

Feature request

When multiple threads are running instances of the same simulation, it's likely they'll generate pseudorandom numbers at about the same rate. This means that if they use a regularly reseeded PRNG such as thread_rng(), they're likely to come up for reseeding at about the same time and therefore block as they contend to access /dev/urandom and it gains a backlog of seed-generating work. To fix this, different threads' PRNGs could be set to output different staggered numbers of bytes (I'd suggest the binary Van der Corput sequence linearly transformed to the 64KiB~128KiB range) before their respective first reseedings. Another option would be to have one dedicated thread that reads seed bytes into a ring buffer, stopping only when the buffer is full, and have each PRNG read seeds from it (although this probably couldn't be both lockless and starvation-free).

newpavlov commented 1 year ago

This means that if they use a regularly reseeded PRNG such as thread_rng(), they're likely to come up for reseeding at about the same time and therefore block as they contend to access /dev/urandom and it gains a backlog of seed-generating work.

We do not use blocking randomness sources outside of an initial check of entropy pool initialization, so there is usually no "seed-generating work" triggered by re-seeding of ThreadRng. I highly doubt that it's possible to observe a measurable impact on performance caused by your scenario. Even if we are to assume that OS needs to get an exclusive block for some inner resource (a very big assumption), threads will quickly get out of sync with each other and even difference in start times between threads should be sufficient enough.

Also, if such possibility is your concern, then you probably should not use ThreadRng in the first place. Instead you should use a faster (possibly, non-cryptographic) PRNG which gets initialized only once per worker thread.

vks commented 12 months ago

I think we are only reseeding for "key erasure", i.e. it's a security feature irrelevant to your use case. (I think there are better alternatives for key erasure, but this is not relevant to this issue.)

If you care about performance, I strongly recommend against using thread_rng(). Just use StdRng or SmallRng directly. If you care about reproducibility, something like rand_xoshiro::Xoshiro256PlusPlus might be better (assuming you don't need cryptographic unpredictability for your application). You could even use jump() to guarantee 2^128 non-overlapping sequences.

Pr0methean commented 12 months ago

I'd still feel more confident in the results of a Monte Carlo simulation if I knew each thread was being reseeded in the time it would take to "learn" any patterns from the current seed -- after all, you never know whether your simulation will turn out to depend on subtle patterns in your PRNG and accidentally become a randomness test, especially if you're using evolutionary optimization. I was thinking more along the lines of a "mostly best-efforts" approach to reseeding -- e.g. start non-blocking attempts to reseed at 32KiB of output, but don't block waiting for a seed until 128KiB of output.

newpavlov commented 12 months ago

start non-blocking attempts to reseed at 32KiB of output, but don't block waiting for a seed until 128KiB of output.

As I wrote above, we do not use "blocking" entropy sources outside of potential initialization checks which usually happen only once during program execution.

I don't think this issue is actionable, ThreadRng is arguably should not be used in the author's use case and a custom solution should be used instead (e.g. fast CSPRNG initialized once or periodically re-seeded fast non-cryptographic PRNG), so I am going to close this issue.