rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.6k stars 423 forks source link

Add trait for RNGs that support efficient jump-ahead #1458

Open benwh1 opened 2 weeks ago

benwh1 commented 2 weeks ago

Background

𝔽₂-linear PRNGs like xoshiro support efficient jump-ahead, but currently the only jump-ahead functions in rand_xoshiro are by 2^(n/2) or 2^(3n/4) steps where n is the number of bits of state, e.g. 2^64 and 2^96 steps for Xoroshiro128StarStar. There is currently no way to jump ahead by any other amount, say by 2^48 or 10^30 steps, without just going one step at a time which is impractical.

Feature request

I propose adding a trait called something like JumpAheadRng or JumpableRng that defines a general jump_ahead function to jump ahead n steps. There was a previous issue about this a few years ago (#1103) and a comment here bringing up a potential issue:

... and here we get to the next problem: ChaCha supports setting the word position with a granularity of 16-word-blocks; Xoshiro supports jumping ahead large numbers of values (2^128 or 2^192 for the 256-bit variant); many generators do not support jumping at all. How do we design a reasonable abstraction over this?

My proposal is for the trait to only support RNGs that can jump ahead by an arbitrary number of steps, and other RNGs that have limited jump-ahead can do their own thing without using the trait (or perhaps their implementation should just do n single steps in a loop? or some combination of single steps and whatever efficient jump-ahead is possible?).

Xoshiro/xoroshiro/xorshift support jump-ahead by an arbitrary number of steps, not just 2^(n/2) or 2^(3n/4). I think the reason why only 2^(n/2) and 2^(3n/4) steps were implemented is because those are the cases that were used in the reference implementations. The implementation for all powers of 2 is the same as here, just using different constants, so arbitrary jump-ahead by m steps can be implemented by sequentially jumping ahead by 2^k for each k where the kth bit of m is set. Note that one (rather minor) downside is that it would require pre-computing and storing a somewhat large (a few kilobytes) list of these jump constants for each type.

If this is something that would be suitable then please let me know and I will implement it and open a PR and add support to rand_xoshiro.

TheIronBorn commented 2 weeks ago

Is there a use-case for arbitrary jump-sizes? 2^64 and 2^96 steps for Xoroshiro128StarStar allow 2^64 and 2^32 jump points. It's difficult to imagine anyone needing ~2^96 random numbers for ~2^32 streams

benwh1 commented 2 weeks ago

Is there a use-case for arbitrary jump-sizes? 2^64 and 2^96 steps for Xoroshiro128StarStar allow 2^64 and 2^32 jump points. It's difficult to imagine anyone needing ~2^96 random numbers for ~2^32 streams

If you generate a lot of numbers on one stream, say 10^12 or more, it would be useful to be able to restart the stream from that point in the event of e.g. a program crash, without having to redo all of those individual steps. It would also enable going in reverse (by jumping P-1 steps where P is the period). I could also see it potentially being useful to researchers, e.g. if they wanted to do something like testing for bias in every nth output for some large n.

dhardy commented 2 weeks ago

I am amenable to the idea of adding JumpableRng, but there is potential for confusion (e.g. with some RNGs next_u64 uses two RNG steps; BlockRng has an internal counter in addition to the RNG step).

As a first step, I'd suggest writing your proposed trait here.

benwh1 commented 2 weeks ago

I think jumping ahead by n steps makes more sense than n calls to next_u64 for exactly that reason, and precisely what counts as a "step" should depend on the specific RNG. Also, if we made it n calls to next_u64 instead, then any RNG where next_u64 requires 2 internal steps would be limited to only being jumpable by an even number of steps. The period of the jump-ahead function would also be only half the period of the RNG itself, which seems like it could be confusing and error-prone.

I don't think the trait needs to be anything complex, something simple like this should suffice.

pub trait JumpAhead: RngCore {
    /// Advances the internal state of the RNG by `n` steps.
    fn jump_ahead(&mut self, n: u128);
}

Some other possible ideas for consideration, although these probably aren't too important:

pub trait Reversible: RngCore { fn step_back(&mut self); }

impl<T: RngCore + Periodic + JumpAhead> Reversible for T { fn step_back(&mut self) { self.jump_ahead(Self::PERIOD - 1); } }

dhardy commented 2 weeks ago

something simple like this should suffice

Given your motivation is to restart an RNG stream at some given position to duplicate a crash, you need to know what that position is. This requires some way of reporting the current RNG step-count relative to the seed?

fn jump_ahead_pow2

It's possible to implement jump_ahead using a small number of jump_ahead_pow2 calls for any n (i.e. jump_ahead can be implemented for an RNG only supporting jump-by-power-of-2). Since we don't have to worry overly about speed here, I see no need to expose this method.

fn step_back

Is there a use-case for this?

vks commented 2 weeks ago

I'm afraid I still don't understand what use case would motivate such a feature.

If you generate a lot of numbers on one stream, say 10^12 or more, it would be useful to be able to restart the stream from that point in the event of e.g. a program crash, without having to redo all of those individual steps.

For this use case, wouldn't it be much simpler to log the state of the RNG? This would currently be possible by e.g. serializing the RNG to JSON.

It would also enable going in reverse (by jumping P-1 steps where P is the period). I could also see it potentially being useful to researchers, e.g. if they wanted to do something like testing for bias in every nth output for some large n.

This is an extremely specific use case, does it really have to be supported by a general-purpose library? A researcher could easily use their own implementation of the RNG. I'm also not aware of any statistical tests for RNGs that use jump-ahead.

newpavlov commented 2 weeks ago

Counter-based RNGs like ChaCha support efficient seeking and we expose it using the inherent set_word_pos method. I think it works fine and we do not a separate trait for seekable RNGs. We could introduce it, but right now it seems like more trouble than its worth.

cmcqueen commented 2 weeks ago

Perhaps a use-case is:

benwh1 commented 2 weeks ago

Counter-based RNGs like ChaCha support efficient seeking and we expose it using the inherent set_word_pos method. I think it works fine and we do not a separate trait for seekable RNGs. We could introduce it, but right now it seems like more trouble than its worth.

That makes sense. Perhaps it would be simpler for now just to add a jump ahead function to the xoroshiro generators, like what the ChaCha and Pcg structs already have, rather than trying to unify them all into a trait.

cmcqueen commented 2 weeks ago

I have added a simplerandom crate to match previous work in C and Python. It implements a jumpahead() function for all provided RNGs.

dhardy commented 1 week ago

@cmcqueen though that could be done, it's generally recommended to use less than the square root of the RNG's period for small generators.

The pattern I'd recommend as an alternative:

We already have all the functionality to support this.