rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.68k stars 430 forks source link

Is there a `L'Ecuyer` or MRG32k3a implementation? #997

Closed CGMossa closed 3 years ago

CGMossa commented 4 years ago

Background

What is your motivation?

I've looked for an implementation in rand_core and in the docs, plus searched the repository and found no mention of L'Ecuyer.

What type of application is this? (E.g. cryptography, game, numerical simulation)

I believe that L'Ecuyer PRNG algorithm yields numbers that are suited concurrent sampling.

Feature request

I think this should be implemented if not already. I've been given these resources on it:

http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00s.pdf https://github.com/vigna/MRG32k3a https://pubsonline.informs.org/doi/abs/10.1287/opre.47.1.159

I'm going to attempt a re-write.

vks commented 4 years ago

Why are you looking for this RNG in particular? What does it offer over the other ones already implemented in Rand (i.e. chacha, xoshiro)?

CGMossa commented 4 years ago

I'm not an expert on this, I just have seen in the R community, that this prng is also in consideration when evaluating different prngs. I believe xoshiro is also a parallel-safe prng, so l'ecuyer doesn't offer something new. Are there a TestU01-test/benchmark written anywhere?

dhardy commented 4 years ago

parallel-safe

Does this term mean anything?

You need independent RNG state for each that to get good performance.

Perhaps you mean that one can safely seed multiple independent streams. For this purpose we specifically chose RNGs supporting ~128-bit seeds or larger. There are some potential issues with PCG streams being too similar, so not all of those RNGs are suitable (but the ones with 64-bit output already use 128-bit internal state).

You also need to ensure the seeding mechanism does not create similarities in state — some people recommend using a different type of RNG for seeding each thread's generator; I'd recommend using a (near) crypto-grade generator such as ChaCha since any such similarities would violate the requirements on a crypto generator. Seeding one RNG from another is easy; see the book.

vks commented 4 years ago

Maybe we should add a paragraph to the book about initializing RNGs for parallel computations?

CGMossa commented 4 years ago

Yeah, that's what I was thinking of.

I don't know why the different bits length matters. I'll close this, as I can see this "multiple independent streams"-business has been addressed.

PS: @vks yeah, because I looked at the book first, and thought: This is not addressed here, maybe there's something missing.

dhardy commented 4 years ago

Bit lengths: just because if the PRNG state is too small (e.g. 64 bits) and you have many parallel instances each drawing many values, it is conceivable that some will overlap. With 128 bits that isn't really a problem.

Yes, the book may need improving. Leave this open as a reminder until it does. Any volunteers to address it?

CGMossa commented 4 years ago

I'd like to help :D

dhardy commented 4 years ago

Cheers! Hopefully this will get you started.

CGMossa commented 4 years ago

I tried to get started on this, but the book is failing the tests. I've also discovered that Julialangs sampler is not implemented in rand. It is (apparently) very fast.

I don't think I can take the lead on this task. Will help as much as I can otherwise.

On Wed, 8 Jul 2020 at 16:22, Diggory Hardy notifications@github.com wrote:

Cheers! Hopefully this https://rust-random.github.io/book/contrib-doc.html#the-book will get you started.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rust-random/rand/issues/997#issuecomment-655551957, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDVSC2UDTM7Q2DVG7FJMDR2R6LXANCNFSM4OUPPPUQ .

fabienlefloch commented 4 years ago

I could see several arguments for MRG32k3a and "parallelization". With L'Ecuyer RNG, you may skip ahead fast. Use case is you want to split a Monte-Carlo simulation over N processors. Two kinds of skip-ahead are possible: fixed size (skip 2^40 numbers for example), or variable size: I have an MC simulation over M paths, I want to skip k*M/N for k=0,...,N-1.

Now this is also possible with Chacha (depending on how it is actually used - did not check the details of the implementation in this lib).

Another argument is that perhaps one is moving some existing stuff to Rust and want to keep the same numbers. The existing stuff is likely to use a popular RNG, such as some Mersenne-Twister variant or MRG32k3a. PCG64xyz and xoroshiro++** are fairly new to the scene.

Personally I find it surprising that so much care seems to be taken of in the design of this lib, and yet the author does not seem all too familiar with the scientific community needs.

vks commented 4 years ago

Now this is also possible with Chacha (depending on how it is actually used - did not check the details of the implementation in this lib).

With Chacha you could just generate random seeds, iterate through all possible seeds, or use the set_stream method. You can also skip ahead a variable number of words with set_word_pos.

Another argument is that perhaps one is moving some existing stuff to Rust and want to keep the same numbers. The existing stuff is likely to use a popular RNG, such as some Mersenne-Twister variant or MRG32k3a.

The design of Rand is extensible, you can use a crate (or implement it yourself) if you want to use a legacy RNG.

Personally I find it surprising that so much care seems to be taken of in the design of this lib, and yet the author does not seem all too familiar with the scientific community needs.

What needs of the scientific community are not addressed?

dhardy commented 4 years ago

The design of Rand is extensible, you can use a crate (or implement it yourself) if you want to use a legacy RNG.

We have a rngs repository for additional generators and would be happy to accept implementations of well-known generators there. That said, we have no funding source and cannot commit to implementing much ourselves.

There are reasons to move on from Mersenne Twister, but in case you need to use it to reproduce results, an implementation exists (not from us but compatible). I'm not sure if there are currently MRG32k3a or L'Ecuyer implementations for Rust, but creating and publishing one yourself (or as a pull request on our repository if preferred) shouldn't be so hard.

Use case is you want to split a Monte-Carlo simulation over N processors.

This can be done with skip-ahead, e.g. with ChaCha, but typically we recommend using a cryptographic master generator to seed each parallel generator. Our ChaCha implementation (and most I believe) uses a 64-bit counter, which likely isn't enough for parallel usage, however it uses a 256-bit seed. The parallel generator (ChaCha or other) still needs to support at least 128-bits of state and independent streams (so not recommended to use PCG), but a 64-bit period is acceptable.

vks commented 3 years ago

I'm closing this in favor of https://github.com/rust-random/book/issues/45. Please feel free to open a new issue if something else needs to be addressed as well.