Move PRNG algorithms out of Rand

pitdicker commented 6 years ago

Edit by dhardy: adding a tracker for the planned tasks:

[x] Replace SmallRng algorithm (see https://github.com/dhardy/rand/issues/60 and https://github.com/dhardy/rand/issues/52)
[x] Deprecate XorshiftRng (required: that the RNG is available in another crate)
[x] Move ISAAC* to new crate
[x] Publish PCG RNGs somewhere (rand_rngs, rand_pcg or something)
[ ] Publish SFC generators somewhere
[x] Review xoshiro crate by @vks
[ ] Review state of other published RNGs (see linked comment) and fix up if useful
[ ] Document available RNGs (internal and external crates)

Also: we should consider whether we want to rename PRNGs when moving or adding them.

As decided in https://github.com/dhardy/rand/issues/58.

The idea is to only keep the StdRng and SmallRng wrappers in Rand, and to not expose PRNG algorithms directly.

The field of PRNG algorithms is not static, it seems like a good idea to cut Rand partly loose from the developements to improve our backward-compatability and usability story. Advantages (from the original issue)

rand provide easy to reason generator(s) to satisfy the bulk of the users

rand provides low-friction upgrades and keeps the optimal-ish algorithms readily available

rand will not accumulate and/or deprecate specific algorithms throughout time

Specific rand Rng names won't bleed out to the world, like in forum posts, stackoverflow, etc..

Instead we want to provide an official companion crate, rand_rngs, with a couple of blessed implementations.

Whether Rand should depend on rand_rngs or copy the two algorithms used by StdRng and SmallRng is not fully decided. It might reduce code size if a crate depends on both StdRng and its algorithm (currently Hc128Rng) directly, but that seems rare and not worth much. It has the disadvantage that both implementations should remain in sync. Maybe the CI can check this?

If Rand depends on rand_rngs, it would restrict rand_rngs to only depend on rand_core. But that is what rand_core is for. I am for having Rand depend on rand_rngs.

rand_rngs

rand_rngs should provide a good selection of PRNGs.

Including multiple PRNGs in one crate has the advantage of having a single place to provide guidance, as it helps comparing different algorithms with each other. Also it gives one place to offer consistent benchmarks, to run PRNG test suites, to keep up a common quality level, and possibly to develop functionality that is useful for more than one PRNG (e.g. jumping).

Which PRNGs to include has been the topic of endless discussions. A few are basically decided. I fully expect the number of PRNGs to grow over time. But we also should be careful not to expose too many, as there are hundreds. Every PRNG should have one thing that gives it a clear advantage over others, otherwise it is not worth the inclusion.

Normal PRNGs

https://github.com/dhardy/rand/issues/60 explored normal PRNGs. The following 5 I feel comfortable about for an initial version of rand_rngs:

PCG-XSH-RR 64/32 (LCG)
PCG-XSL-RR 128/64 (MCG)
SFC (32-bit variant)
SFC (64-bit variant)
Xoroshiro128+

The two PCG variants offer good quality and reasonable performance. SFC provides high performance and a chaotic PRNG (not a fixed period). Xoroshiro128+ has acceptable quality but high performance.

An PRNG put together by me, XoroshiroMT, is also good quality and sits between PCG and Xoroshiro128+ qua performance. But now that there is just a new Xoshiro PRNG, it may be worth evaluating that one first, as it is said to also be good quality.

CSPRNGs

For CSPRNGs we currently have two good implementations in Rand

ChaCha20
HC-128

ChaCha20 offers reasonably good performance and uses little memory, while HC-128 brings high performance at the cost of using much memory.

I would also like to see some implementation of AES-CTR DRBG eventually, as it is commonly used, and has hardware support on modern desktop processors.

Other / deprecated PRNGs

We currently have the ISAAC, ISAAC-64 and Xorshift128/32 PRNGs in rand. They have no real advantages over the PRNGs metioned before. It is better to use HC-128 instead of ISAAC, and Xoroshiro128+ or PCG instead of Xorshift. We can include them in something like a deprecated module, but I propose to move them to stand-alone crates outside the Rand repository.

Steps to take

move the prngs module to a seperate rand_rngs crate in the Rand repository, similar to rand_core.
move generators benchmarks over.
lift PRNG implementations from https://github.com/pitdicker/small-rngs (I have updated it to master a while ago, but still have to push the changes)
lift Xoroshiro128+ from https://github.com/vks/xoroshiro/

Does this sound about right?

vks commented 6 years ago

Why not have one crate per RNG?

On Sun, May 6, 2018, 15:40 Paul Dicker notifications@github.com wrote:

As decided in dhardy#58 https://github.com/dhardy/rand/issues/58.

The idea is to only keep the StdRng and SmallRng wrappers in Rand, and to not expose PRNG algorithms directly.

The field of PRNG algorithms is not static, it seems like a good idea to cut Rand partly loose from the developements to improve our backward-compatability and usability story. Advantages (from the original issue)

rand provide easy to reason generator(s) to satisfy the bulk of the users

rand provides low-friction upgrades and keeps the optimal-ish algorithms readily available

rand will not accumulate and/or deprecate specific algorithms throughout time

Specific rand Rng names won't bleed out to the world, like in forum posts, stackoverflow, etc..

Instead we want to provide an official companion crate, rand_rngs, with a couple of blessed implementations.

Whether Rand should depend on rand_rngs or copy the two algorithms used by StdRng and SmallRng is not fully decided. It might reduce code size if a crate depends on both StdRng and its algorithm (currently Hc128Rng) directly, but that seems rare and not worth much. It has the disadvantage that both implementations should remain in sync. Maybe the CI can check this?

If Rand depends on rand_rngs, it would restrict rand_rngs to only depend on rand_core. But that is what rand_core is for. I am for having Rand depend on rand_rngs. rand_rngs

rand_rngs should provide a good selection of PRNGs.

Including multiple PRNGs in one crate has the advantage of having a single place to provide guidance, as it helps comparing different algorithms with each other. Also it gives one place to offer consistent benchmarks, to run PRNG test suites, to keep up a common quality level, and possibly to develop functionality that is useful for more than one PRNG (e.g. jumping).

Which PRNGs to include has been the topic of endless discussions. A few are basically decided. I fully expect the number of PRNGs to grow over time. But we also should be careful not to expose too many, as there are hundreds. Every PRNG should have one thing that gives it a clear advantage over others, otherwise it is not worth the inclusion. Normal PRNGs

dhardy#60 https://github.com/dhardy/rand/issues/60 explored normal PRNGs. The following 5 I feel comfortable about for an initial version of rand_rngs:

PCG-XSH-RR 64/32 (LCG)

PCG-XSL-RR 128/64 (MCG)

SFC (32-bit variant)

SFC (64-bit variant)

Xoroshiro128+

The two PCG variants offer good quality and reasonable performance. SFC provides high performance and a chaotic PRNG (not a fixed period). Xoroshiro128+ has acceptable quality but high performance.

An PRNG put together by me, XoroshiroMT, is also good quality and sits between PCG and Xoroshiro128+ qua performance. But now that there is just a new Xoshiro PRNG, it may be worth evaluating that one first, as it is said to also be good quality. CSPRNGs

For CSPRNGs we currently have two good implementations in Rand

ChaCha20

HC-128

ChaCha20 offers reasonably good performance and uses little memory, while HC-128 brings high performance at the cost of using much memory.

I would also like to see some implementation of AES-CTR DRBG eventually, as it is commonly used, and has hardware support on modern desktop processors. Other / deprecated PRNGs

We currently have the ISAAC, ISAAC-64 and Xorshift128/32 PRNGs in rand. They have no real advantages over the PRNGs metioned before. It is better to use HC-128 instead of ISAAC, and Xoroshiro128+ or PCG instead of Xorshift. We can include them in something like a deprecated module, but I propose to move them to stand-alone crates outside the Rand repository. Steps to take

move the prngs module to a seperate rand_rngs crate in the Rand repository, similar to rand_core.

move generators benchmarks over.

lift PRNG implementations from https://github.com/pitdicker/small-rngs (I have updated it to master a while ago, but still have to push the changes)

lift Xoroshiro128+ from https://github.com/vks/xoroshiro/

Does this sound about right?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/rand/issues/431, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCtJsl20y6PoCkWCeYCSRhM0dRq4Pvks5tvv0ygaJpZM4T0Cak .

pitdicker commented 6 years ago

See the first post and the linked issue.

dhardy commented 6 years ago

I actually see no reason that we shouldn't have one crate per RNG (or perhaps family of RNGs), but I think it's worth having these in the Rand repo and the crates owned by the Rand team (currently me and the libs team). Or we could try organising more and have a rust-rand project and set of repositories on GH maybe.

TheIronBorn commented 6 years ago

Both the new xoshiro and xoroshiro** variants I think are worth consideration.

Vigna claims xoroshiro256** has a higher degree polynomial than PCG 128 variants https://www.reddit.com/r/programming/comments/8gx2d3/new_line_of_fast_prngs_released_by_the_author_of/dygkkng/, so xoshiro variants are probably even stronger.

Edit: poor wording

burdges commented 6 years ago

Is the separate repository is merely a transitional detail? It improves code size if any code that gets used actually gets exposed somewhere, even if only in another crate. It would improve tests if all this lives in the rand repository, right?

dhardy commented 6 years ago

@TheIronBorn this issue isn't about what algorithms we include, it's just about organisation and presentation.

dhardy commented 6 years ago

@burdges I'm not sure what you mean about transitional; no. The easiest option is I think just to have sub-projects in this repo (like rand_core).

I'm not even sure if it does help with testing having everything in the same repo; in one way it makes it worse since CI must run all unit-tests on any change (although with API-breaking changes this may be necessary).

Once we have stable core APIs it may make more sense to use separate repos since most changes (docs, new features, tweaks to other features) will not affect other crates. I.e. once rand_core is stable then changes to things like the distribution code will have no effect on the RNG implementations, so separate repos do make some sense.

pitdicker commented 6 years ago

In this issue and the previous one I have given arguments that it would really benefit users to have one crate, as it helps comparing and contrasting them. And real but smaller benefits for us. Are those reasons wrong?

Maybe it is good to also collect the reasons multiple crates can be a good idea.

Two related thoughts:

most non-cryptographic PRNGs only need 5-10 lines for the algorithm. About 30 additional lines to implement the necessary traits. Then a handful of tests, documentation, license header, and benchmarks. Seems a bit small for a crate to me, but that is not unique.
how do we show separate crates are similar enough to compare directly (benchmarks, quality, support)? Anyone is free to create a rand_* crate.

CI must run all unit-tests on any change.

The unit tests we have now for PRNGs run in a fraction of a second, easily a 100 times less than the set-up time for Travis.

vks commented 6 years ago

There is precedence to have one crate per algorithm in RustCrypto. You can still have a crate reexporting them all, but I'm not sure how useful that is. It might make sense to have some comparison in Rand's documentation, where we advertise the recommended RNGs and their crates.

On Mon, May 7, 2018, 08:32 Paul Dicker notifications@github.com wrote:

In this issue and the previous one I have given arguments that it would really benefit users to have one crate, as it helps comparing and contrasting them. And real but smaller benefits for us. Are those reasons wrong?

Maybe it is good to also collect the reasons multiple crates can be a good idea.

Two related thoughts:

most non-cryptographic PRNGs only need 5-10 lines for the algorithm. About 30 additional lines to implement the necessary traits. Then a handful of tests, documentation, license header, and benchmarks. Seems a bit small for a crate to me, but that is not unique.

how do we show separate crates are similar enough to compare directly (benchmarks, quality, support)? Anyone is free to create a rand_* crate.

CI must run all unit-tests on any change.

The unit tests we have now for PRNGs run in a fraction of a second, easily a 100 times less than the set-up time for Travis.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/rand/issues/431#issuecomment-386970760, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCtKxNDruFXra7bvopRZ0gZTadbl2_ks5tv-qSgaJpZM4T0Cak .

dhardy commented 6 years ago

So sounds like

we should put new crates within the same repo, at least for now (like rand_core is)
everything in the prng module should move
still to decide exact crates
probably we should make rand depend on whichever crates are necessary for the StdRng and SmallRng algorithms

How many crates do we want? Only 1-3 (e.g. crypto, non-crypto and deprecated)? One crate per family of RNGs? One crate per RNG?

having many crates means we don't have to deprecate generators; we can just update the doc and ignore the relevant crate(s) (potentially move to a new repo)
having many crates may make comprehensive testing and benchmarking painful (extra scripts required; lots of binaries to compile); having a "master" crate may alleviate this
each crate has extra boilerplate (cargo.toml, readme, possibly tests and benchmarks)
each crate has to be published independently

@newpavlov do you have anything to say about the many-small-crates design of RustCrypto's hashes (and other projects)?

newpavlov commented 6 years ago

In my opinion small crates work pretty well (but not too small, i.e. crates for family of algorithms, not for each variation), yes there is some headache with coordinated upgrades to the new trait version, but if rand_core will be relatively stable, I don't think it will be a big problem.

having many crates may make comprehensive testing and benchmarking painful

cargo test --all and cargo bench --all more or less solve this problem. There is an issue with features when you work with virtual manifest, but I don't think that RNG crates will suffer from it.

each crate has extra boilerplate (cargo.toml, readme, possibly tests and benchmarks)

RustCrpyto partially solves this problem by having feature gated dev module in trait crates, which define functions and macros to make it easier to write tests and benchmarks. Though I am not sure if RNG crates should follow this approach.

I would like suggest to move CryptoRng crates to the RustCrypto/CSRNGs repo. (of course with full access granted to @dhardy and lang team) Pros:

CSPRNGs usually tightly coupled with stream ciphers, so they could share underlying implementations, which will make easier to work on optimizations.
I plan to have other CSRNGs in addition to algorithms from rand crate, e.g. rdrand (@nagisa gave his consent) and aesrng developed by @vks

Con is of course is the split of RNG crates between two organizations.

dhardy commented 6 years ago

Actually we already stopped using --all for most tests because of this issue. It's a shame it's not been solved.

burdges commented 6 years ago

It'd be nice to save code size by using stream cipher code for CSPRNGs of course, but not that pressing. At first blush, I kinda think moving CSPRNG crates into another project, especially one less blessed by Mozilla, sounds premature and distracting. At minimum, all the BlockRng business will make reading the code annoying.

dhardy commented 6 years ago

With our current PRNGs and currently-proposed additions, a crate-per-family approach looks something like:

rand_rng_xorshift
rand_rng_xoroshiro (or could be with xorshift; also can include xoshiro)
rand_rng_pcg
rand_rng_sfc
rand_rng_isaac
rand_rng_chacha
rand_rng_hc (could potentially also house HC-256, if we have any use for it)

Lets not worry about the generators for now (there will probably be more); this is several crates already, and some of them will be very small (pcg unless we add more algorithms, sfc, xorshift unless combined with xoroshiro; also hc and chacha are only 500 lines).

There is nothing inherently wrong with this, but it is a significant amount extra boilerplate just to save pulling unused PRNGs into a project (the entire rand::prng module is only 2600 lines, albeit because we have avoided expanding this much until now).

Are there any alternatives worth discussing?

vks commented 6 years ago

I would probably use shorter names and drop the rng_, but other than that I think one crate per family is reasonable. I think this is nicer for other crates, because they can depend on the specific algorithm without having to worry whether it is currently used by Rand or not, and we can keep Rand minimal at the same time.

pitdicker commented 6 years ago

Just to be sure, the only real argument for having multiple crates is that at some point is becomes 'cleaner' for us to deprecate an RNG?

And it adds the problem of boilerplate, making the PRNGs harder to compare, more difficult to guide users, and anyone is free to make a crate with a seemingly official name.

pitdicker commented 6 years ago

I would like suggest to move CryptoRng crates to the RustCrypto/CSRNGs repo. (of course with full access granted to @dhardy and lang team) Pros:

CSPRNGs usually tightly coupled with stream ciphers, so they could share underlying implementations, which will make easier to work on optimizations.

I plan to have other CSRNGs in addition to algorithms from rand crate, e.g. rdrand (@nagisa gave his consent) and aesrng developed by @vks

Con is of course is the split of RNG crates between two organizations.

I am afraid this would only cause more work, not less. While algorithms may be (mostly) the same, RustCrypto and Rand can have different goals. What happens when one decides some algorithm is fit for inclusion, but the other does not. Or that the trade-off for performance vs security should be different? We would have to have a very clear sense of direction and and idea of what both projects see in such a crate.

Now I am not saying we can't work together, and work together well. But at this point I think it is better for us if we keep this only in mind as an option, and rethink it next year or something like that. Rand has changed a lot in the past couple of months, and it seems like a good idea to me to remain flexible and get our story straight first. Especially the amount of security features to provide is something we have quite a few open issues for, and seems a bit fuzzy.

newpavlov commented 6 years ago

What happens when one decides some algorithm is fit for inclusion, but the other does not.

if it's officially published algorithm (i.e. not home-brew crypto) designed as CSRNG I will be happy to see it in the repo, so RustCrypto/CSRNGs is intended to be more inclusive than rand.

Or that the trade-off for performance vs security should be different?

Ehm, for CryptoRng implementations priority should be security, otherwise it should not implement this trait.

pitdicker commented 6 years ago

Ehm, for CryptoRng implementations priority should be security, otherwise it should not implement this trait.

:smile: Maybe I should write a bit clearer about which trade-offs I had in mind. Nothing terrible I hope.

As an example, the BlockRng wrapper (used by ChaChaRng and Hc128Rng) could choose to zero out a value after it was read from the buffer. That brings down the performance of the RNG by 10-20%. It would make it more difficult for someone peeking at the memory to recover previous values. But in my opinion, when someone can inspect the memory of a process, you have already lost. So we currently don't zero after reading a random number from a block of buffered results.

Another one I have in mind is fork protection. I hope we can add that to the ReseedingRng wrapper soon. It means checking a global static and reseeding if it has changed. Should that happen on every iteration, or only when a new block of values has to be generated? Again that is the choice between 1-2% overhead, or 10-20%. I would say having something like 15 duplicated values after a fork (which is not supposed to happen) is an acceptable trade-off, especially because it means users have no good reason to turn off the protection.

burdges commented 6 years ago

Zeroing ChaChaRng output does nothing. If you can hammer the offset then you can hammer the block number, well they even live in the same struct.

dhardy commented 6 years ago

Just to be sure, the only real argument for having multiple crates is that at some point is becomes 'cleaner' for us to deprecate an RNG?

What's the alternative — maintain a crate that may grow to a hundred or more PRNGs, or be extremely selective about which we include? Perhaps we won't include enough PRNGs for size to be an issue, but in that case what's the motivation for using an external crate in the first place?

The only other reason I can see for separate crates is to allow an app to use rand_core + some PRNG without importing much extra code, i.e. to minimise what needs to be reviewed.

pitdicker commented 6 years ago

What's the alternative — maintain a crate that may grow to a hundred or more PRNGs, or be extremely selective about which we include?

PractRand contains 200+ RNGs, and that is already a selection. In my opinion having that many is not useful. Even the opposite, too many choices distract and complicate, so having many choice would be bad.

What I had in mind (with the first post, and the relevant issues in your repro), was to have a small, well chosen selection. Not all that different from how we work now. Users that want more than StdRng or SmallRng should just be able to go to one crate and have a clear overview of good choices. With a selection of PRNGs that come from different families, have different performance, quality, memory usage, security and features. I don't see that become more than 20 choices.

My idea was not to have some super-crate or organization to implement every (possibly even obsolete) PRNG design under the sun. But to have a good resource for users.

And other crates can still have a role in providing more variants or features, like the current PCG crate.

vks commented 6 years ago

@pitdicker What you are suggesting would be possible by reexporting the smaller crates in a "curation" crate. I would prefer this over duplicating code.

Note that a small selection might become not so small as algorithms become obsolete and get replaced but have to be kept for backwards compability.

dhardy commented 6 years ago

In this case we have three options:

Don't add individual crates. In this case we could still potentially add them later.
Use individual crates only. Add a section in the Rand documentation giving an overview of good generators across these crates.
Add individual crates and a "curation" crate. The only advantages over the above are somewhere more specific to house overview documentation and a convenient place for users to import from. I don't see much advantage here however.

If we keep the number of choices low (e.g. 20 algorithms) then the first option is a good choice, though as @vks says this may grow over time (removing algorithms is not a good idea except perhaps if a CSPRNG is found to be insecure, because it causes problems for any libraries needing reproducibility with old algorithms and requires extra maintenance).

Also, no one has answered my other question yet: if we don't have per-family crates, is there any point at all moving the algorithms out of the main Rand crate?

vks commented 6 years ago

if we don't have per-family crates, is there any point at all moving the algorithms out of the main Rand crate?

We could make breaking changes in the RNG crate and use it in Rand without making a breaking change there (if the RNG crate is not part of Rand's public interface).

dhardy commented 6 years ago

We could make breaking changes in the RNG crate

Such as:

Incompatible changes to an algorithm. Rare, since usually we only implement mature algorithms and test against test vectors (though e.g. handling of next_u32 followed by next_u64 has changed in current generators).
Changes to the API. But this really comes down to the rand_core traits, which are unlikely to have breaking changes (and would require bumping the Rand version number anyway).
Deprecating or removing algorithms. But this never has to happen quickly (except maybe for security, but that seems unlikely and may not be the best way to handle a security issue anyway).

Also using an external crate allows usage with rand_core but without the main crate; however I'm not sure this is a big advantage, at least unless we choose to further modularise Rand.

Lacking further argument I propose we just add new PRNGs to Rand for now. Alternatively I'm not against adding a curation crate(rand_rngs) or something, though in that case we would probably not want to add the ISAAC generators to the curation crate. Or adding family-crates as mentioned above.

Just to be sure, the only real argument for having multiple crates is that at some point is becomes 'cleaner' for us to deprecate an RNG?

It's easier to review a crate containing only a small number of similar RNGs (i.e. keep things like Xorshift and Hc128 separate), although I don't see a good reason that e.g. ChaCha and Hc128 shouldn't be in the same crate (other than lack of an obvious name).

dhardy commented 6 years ago

Well, nothing has happened in the last month, yet I feel this issue is quite important.

I suggest we move forward as follows:

We replace SmallRng's implementation with something else (do not discuss which algorithm here!), keeping the implementation hidden. This gives us a better SmallRng without dependence on any other crate or public PRNG in Rand.
We deprecate XorshiftRng. If we can, we point them to another crate with a compatible implementation.

We should move forward with some small(er) crates implementing families of PRNGs. On a case-by-case basis we decide whether to host these externally or within the Rand repository (which may or may not be considered to imply a higher level of review). (Alternatively we could start a "Rust Rand" organisation on GitHub; not sure that it's worth it though.)

The following PRNG crates already exist:

@pitdicker has a small-rngs repository (not published as a crate)
@vks has created a xoshiro crate for Xoshiro and Xoroshiro RNGs.
@astocko has an xorshift crate, though it appears to be unmaintained and overhaps with @vks's xoshiro; maybe it could focus on just Xorshift generators
@robojeb has a pcg_rand crate with many variants available
@afnanenayet has a simple pcg crate containing just PCG 32 (though this doesn't use Rand traits)
@dcrewi has a nice Mersenne Twister implementation

Related, external RNGs:

@nagisa has an rdrand crate
@vityafx has a randomorg crate, though it doesn't use Rand

Quality and maintenance:

Of the above, only @vks's crate uses rand_core 0.2; all others use rand 0.3.x or 0.4.x if they depend on Rand at all. I don't know if this says we should be doing more to help other authors or that most of the above are basically abandoned; either way we can't really promote external RNG crates which don't support the latest Rand.
Code in the Rand repo may get better quality review than external crates. For non-crypto PRNGs this is not such a big deal so long as the implementations test against published test vectors.

Therefore I think we should slightly prefer internal implementations of PRNGs in our documentation and be open to pull-requests to add PRNGs here, but not require it.

Promoting RNGs: I'm not convinced at this time it's worth having a "curation crate", though we could do so. A better option might be to add a document to Rand reviewing various RNG crates (there are a few other things which could go in a Rand "book").

Organisation:

Edit: we should probably just use the crate names without subdirectory.

~Suggestion: we add an rngs directory, then any sub-crates in this repo can go under here (e.g. REPO_ROOT/rngs/xoshiro)~

"Crypto" RNGs

We currently have only two implementations, Hc128Rng and ChaChaRng. For now I suggest we leave them in Rand's main crate, though we could move to rngs/csprng (rand_csprng crate) later.

Since the ISAAC generators are no longer used by ThreadRng, lets move them out to rand_isaac crate/subdir).

Naming

This would be a good time to rename PRNGs, if appropriate.

Currently all internal PRNGs have a Rng suffix, e.g. XorShiftRng. Some other implementations follow this convention; some don't. To me it seems redundant to use within a crate of RNGs. Therefore we could rename e.g. IsaacRng to Isaac32 or just Isaac.

Crate names: I suggest that all crates created as part of the Rand project use the prefix rand_ (following the convention of rand_core). I'm not sure whether we should make a rename when adopting an external crate into the Rand project; possibly we should do so but also with a compatibility release under the old name.

Does this sound like a good plan?

vks commented 6 years ago

I agree with your plan, it is almost exactly what I would suggest as well.

A better option might be to add a document to Rand reviewing various RNG crates (there are a few other things which could go in a Rand "book").

In addition, I would recommend to use specific RNGs and their crates for reproducibility. At the same time, we could explicitly weaken the reproducibility guarantees for SmallRng and StdRng, so we can replace the algorithms in a feature release.

About the pcg_rand crate: I recently tried to port it to rand_core 0.2, which was surprisingly tricky due the heavy use of generics (similar to the reference C++ implementation). I ran into a compiler bug and gave up, but it might be possible to redesign the library to work around the bug (see https://github.com/robojeb/pcg_rand/issues/8).

dhardy commented 6 years ago

The reproducibility limitations already apply to SmallRng and StdRng (maybe doc could be improved).

I also thought the use of generics in that crate is a bit over the top (the same criticisms apply to @imneme's libraries — most users want one relatively simple PRNG; not metacode for building hundreds of variants of generators). So I'd rather start from @pitdicker's code.

dhardy commented 6 years ago

I added a tracker at the top of this issue.

@vks do you want to keep your xoshiro crate external or move it into this repo? I really don't care either way; externally you get more freedom but also the expectation of maintenance :wink:

Also, what should we do with Xorshift? It's not especially good, but we should probably keep it around somewhere in case anyone wants to reproduce results with it. @astocko has an xorshift crate but it needs updating, and doesn't have the same generator.

On another point: do we want to continue with only one repo or have multiple? Already we have some confusion of branch & tag names with 0.5 referring to the rand version but also hosting rand_core 0.2.x.

Separate repos would give us more git and CI overhead but smaller history and less CI work to do on each change.

vks commented 6 years ago

@dhardy In principle, I'm fine with either way. It depends on what we decide about the fate of PRNGs in Rand: If Rand somehow depends on a PRNG, I think the crate should be in the Rand repository. If Rand just recommends the crate, either way is probably fine. Or should we be worried that a crate we recommend gets compromised? After all, Rand is the third-most popular crate. However, I anticipate the recommended crates will be used by few users, making them a smaller target.

If we decide to give the recommended crates a rand_ prefix, they should probably live in the Rand repository. This would probably the the easiest way to move xorshift and isaac out of Rand, for which crates already exist. Similarily, we could add rand_pcg, rand_xoshiro etc. (Unfortunately there already is a pcg_rand crate, which might be confusing.)

About splitting repos: I'm not sure. It worked well for num, but there the crates are more orthogonal. I would expect there are few issues that only apply to rand_core.

dhardy commented 6 years ago

For non-crypto PRNGs we only need one in Rand (SmallRng) and there isn't a lot of code involved, so we could just copy the code. For CSPRNGs, yes, we should only depend on Rand crates (or perhaps a RustCrypto crate).

I think we should use the rand_ prefix for Rand sub-projects but not for external crates, and if we adopt an external crate into Rand we can rename the crate then (it's not hard for users to switch this, and avoids stepping on the toes of inactive authors).

True, at the moment most issues are strongly linked to the main Rand project even if about some other part such as a rand_core trait or another distribution implementation. Maybe we should reconsider later, but for now continue with one repo.

droundy commented 6 years ago

Would a pull request for a Xorshift1024 crate be accepted at this stage? I'm wanting to have xorshift1024* for a monte carlo simulation I'm writing, and it looks like moving forward having it hosted in this repository would be ideal (and match with the plans outlined here). I'd open a separate issue, but it seems slightly subsumed by this one.

TheIronBorn commented 6 years ago

You mean this xorshift1024*φ algorithm?

~~Given that there are a couple generations of Vigna's generators which more or less obsolete the algorithm, I'd say it's unlikely it would be included.~~ We are considering adding this xorshift crate which has xorshift1024* generators.

xoshiro512 or xoroshiro1024 would be more likely. The xoshiro crate has xoshiro512 and xoroshiro1024 generators could be added. (xoroshiro1024 is described only in the Scrambled Linear Pseudorandom Number Generators paper)

It's also possible that other generators with similarly large periods might be added.

TheIronBorn commented 6 years ago

Note: xorshift4096* also exists. Probably impractical but it's curious. (not sure why it's no longer listed)

dhardy commented 6 years ago

Well, I'll try again: @astocko are you interested in maintaining your Xorshift crate?

But given the lack of commits in the last year it seems unlikely. The code is CC0 so it may be sensible at this point to pull out what we want and make a new rand_xorshift crate.

@vks what do you think should be the divide between Xorshift, Xoroshiro and Xoshiro generators, since you already have the latter two in your crate?

iddm commented 6 years ago

I see you mentioned me here as an author of randomorg crate. I am the one and I just wanted to remind you that this crate unfortunately is not going to have Rng traits implementations due to its nature of remote data source.

vks commented 6 years ago

@dhardy My crates are based on the recommendations by Vigna. The xoroshiro crate includes xorshift1024*phi, because that was recommended together with the xoroshiro generators. The xoshiro crate implements the new generation of Vigna's generators, which includes some xoroshiro variants incompatible to the xoroshiro crate (!). So yes, the crate names are a bit misleading, but I think the current separation makes the most sense. A more accurate name would be rngs_vigna2018, but I'm not sure it would be a better name.

I should probably clarify that in the docs.

On Sat, Jul 14, 2018, 09:52 Diggory Hardy notifications@github.com wrote:

Well, I'll try again: @astocko https://github.com/astocko are you interested in maintaining your Xorshift crate?

But given the lack of commits in the last year it seems unlikely. The code is CC0 so it may be sensible at this point to pull out what we want and make a new rand_xorshift crate.

@vks https://github.com/vks what do you think should be the divide between Xorshift, Xoroshiro and Xoshiro generators, since you already have the latter two in your crate?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/rand/issues/431#issuecomment-405006688, or mute the thread https://github.com/notifications/unsubscribe-auth/AACCtBdWGQeAOE_GE66fWNGjJW8V1cJjks5uGaNFgaJpZM4T0Cak .

dhardy commented 6 years ago

We could just throw it all under rand_vigna, but a separate crate for Xorshift is also fine. His recommendations have changed over time, so putting this years' recommendations under vigna2018 doesn't seem sensible for what we hope will become a stable crate.

vks commented 6 years ago

Well, stability is why I decided to create a new crate, so the old one can still be used for reproducibility. One of the advantages of moving the algorithms out of Rand it that we can just add and drop crates from the recommendations without breaking anyone's code.

afnanenayet commented 6 years ago

Hey, author of the Pcg32 crate here. The other PcgRand crate seems much more full-fledged than mine, but I'd be happy to continue maintaining Pcg32.

I personally would advocate for some sort of vetting/benchmarking with the rand crates. Even something like dieharder scores would be nice to see.

TheIronBorn commented 6 years ago

We currently have quality comparisons here: https://docs.rs/rand/0.5.4/rand/prng/index.html#basic-pseudo-random-number-generators-prngs

I don't know if we've thought about listing actual test data though (probably TestU01 or PractRand if anything)

dhardy commented 6 years ago

@droundy if you're still interested, we'd accept Xorshift1024 in the new rand_xorshift sub-crate (see #557).

dhardy commented 6 years ago

@pitdicker I think it's time to turn your small-rngs repo into a real crate :smile:

As you say, it is more useful having a small crate with a few high-quality PRNGs than to have a huge collection, so how about we create rand_small_rngs with the following guide-lines:

non-crypto only
keep it small (5-20 RNGs)
try to avoid overlap with other high-quality crates — which probably means Xorshift and Xoshiro should be omitted from this crate, though (for now at least) some PCG variants included

Potentially we can add a separate rand_rngs sub-project for comparison purposes (benchmarks, quality testing). These should be separate: the former is a small collection of high-quality RNGs not hosted elsewhere (and should remain small); the latter is a comparison across many RNGs (and could host the generators.rs benchmarks later).

droundy commented 6 years ago

I've just gone ahead and implements a xoroshiro256+ in my own code, which ended up seeming simplest, after I decided that compatibility with my old C code didn't matter. Maybe I'll use a "standard" xoroshiro implementation at some later point, but for now it seemed easiest to just translate the C code myself. That way I didn't have to mess with serde feature flags, and I could easily get proper seeding with a u64 as recommended by Vigna.

peteroupc commented 6 years ago

If it helps you decide which PRNGs to include where, I have written an article on random number generators and categorized "good" RNGs into two categories. Essentially:

A cryptographic PRNG is designed such that prior unseen outputs are hard to guess, and has a state length of at least 128 bits.
A statistical PRNG "must either satisfy the one-way property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of BigCrush", and has a state length of at least 64 bits.

Any RNG other than a cryptographic or statistical RNG, as defined in my article, should be considered low-quality, I think.

afnanenayet commented 6 years ago

I updated my pcg crate to implement the RngCore/Rng traits

rust-random / rand