imneme / pcg-cpp

PCG — C++ Implementation
Apache License 2.0
734 stars 98 forks source link

Correlations of different DXSM streams with increments coresidual to MULTIPLIER-1 #90

Open thynson opened 6 months ago

thynson commented 6 months ago

Michael made some exploits to the DXSM output scrambler. See: https://guru.multimedia.cx/how-correlated-is-pcg-dxsm-output/

thynson commented 6 months ago

** UPDATE: The following code only fails PractRand when seed with 0, @michaelni provide an example that fail for abitrary seed. Combining two correlated stream (with increments of 1 and MULTIPLIER respectively) interleaved


#include <stdint.h>
#include <stdio.h>
#include <time.h>

#define STATE_BITS 64
#define STATET uint64_t
#define OUTT uint32_t
#define MULTIPLIER 6364136223846793005ULL

static OUTT pcg_dxsm(STATET *state, STATET inc) {
    *state = *state * MULTIPLIER + inc;

    OUTT h = *state >>   STATE_BITS / 2;
    h     ^= h      >>   STATE_BITS / 4;
    h     *= MULTIPLIER;
    h     ^= h      >> 3*STATE_BITS / 8;
    return h * (*state | 1);
}

int main(int argc, char **argv) {

    uint32_t buff[4096];
    uint64_t seed = 0; // EDIT: It only fails PractRand quickly when seed with 0
    STATET state1 = seed;
    STATET state2 = seed;
    while (1) {
        for (int i = 0; i < 2048; i++) {
            buff[2 * i] = pcg_dxsm(&state1, 1);
            buff[2 * i + 1] = pcg_dxsm(&state2, MULTIPLIER );
        }

        fwrite(buff, sizeof(uint32_t), 4096, stdout);
    }
}

fails PractRand quickly

RNG_test using PractRand version 0.95
RNG = RNG_stdin64, seed = unknown
test set = core, folding = standard (64 bit)

rng=RNG_stdin64, seed=unknown
length= 512 megabytes (2^29 bytes), time= 2.8 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-1,T)                  R>+99999  p = 0           FAIL !!!!!!!!  
  BCFN(2+1,13-1,T)                  R>+99999  p = 0           FAIL !!!!!!!!  
  BCFN(2+2,13-2,T)                  R>+99999  p = 0           FAIL !!!!!!!!  
  BCFN(2+3,13-2,T)                  R=+14777  p =  4e-7552    FAIL !!!!!!!!  
  BCFN(2+4,13-3,T)                  R= +2007  p =  8.5e-950   FAIL !!!!!!!   
  BCFN(2+5,13-3,T)                  R=+256.8  p =  2.5e-121   FAIL !!!!!     
  BCFN(2+6,13-4,T)                  R= +31.1  p =  1.5e-13    FAIL           
  DC6-9x1Bytes-1                    R>+99999  p = 0           FAIL !!!!!!!!  
  Gap-16:A                          R>+99999  p = 0           FAIL !!!!!!!!  
  Gap-16:B                          R>+99999  p = 0           FAIL !!!!!!!!  
  FPF-14+6/16:(0,14-0)              R= +61.4  p =  2.0e-56    FAIL !!!!      
  FPF-14+6/16:(1,14-0)              R= +63.9  p =  9.0e-59    FAIL !!!!      
  FPF-14+6/16:(2,14-0)              R= +46.9  p =  4.7e-43    FAIL !!!       
  FPF-14+6/16:(3,14-0)              R= +40.7  p =  2.9e-37    FAIL !!!       
  FPF-14+6/16:(4,14-0)              R= +37.1  p =  5.4e-34    FAIL !!!       
  FPF-14+6/16:(5,14-1)              R= +27.5  p =  4.1e-24    FAIL !!        
  FPF-14+6/16:(6,14-2)              R= +16.3  p =  4.9e-14    FAIL           
  FPF-14+6/16:(7,14-2)              R= +15.3  p =  3.9e-13   VERY SUSPICIOUS 
  FPF-14+6/16:(8,14-3)              R= +14.2  p =  3.1e-12   VERY SUSPICIOUS 
  FPF-14+6/16:(9,14-4)              R=  +8.9  p =  3.3e-7   mildly suspicious
  FPF-14+6/16:all                   R=+117.9  p =  3.8e-110   FAIL !!!!!     
  BRank(12):256(4)                  R= +1339  p~=  7.1e-713   FAIL !!!!!!!   
  BRank(12):384(1)                  R= +1401  p~=  7.1e-423   FAIL !!!!!!!   
  BRank(12):512(2)                  R= +3017  p~=  3.3e-909   FAIL !!!!!!!   
  BRank(12):768(1)                  R= +3597  p~=  7e-1084    FAIL !!!!!!!!  
  BRank(12):1K(2)                   R= +7157  p~=  1e-2155    FAIL !!!!!!!!  
  BRank(12):1536(1)                 R= +7989  p~=  7e-2406    FAIL !!!!!!!!  
  BRank(12):2K(1)                   R=+10916  p~=  3e-3287    FAIL !!!!!!!!  
  mod3n(5):(1,9-6)                  R>+99999  p = 0           FAIL !!!!!!!!  
  mod3n(5):(2,9-6)                  R>+99999  p = 0           FAIL !!!!!!!!  
  ...and 199 test result(s) without anomalies
michaelni commented 6 months ago
uint32_t buff[4096];
uint64_t seed = time(NULL);
while (1) {

    STATET state1 = seed;
    STATET state2 = seed;

If you reset the seed inside the loop, it will generate the same 4096 entry block repeatedly, i assume that is unintended

thynson commented 6 months ago

If you reset the seed inside the loop, it will generate the same 4096 entry block repeatedly, i assume that is unintended

Good catch, I edited it a bit before pasting to here

michaelni commented 6 months ago

If you reset the seed inside the loop, it will generate the same 4096 entry block repeatedly, i assume that is unintended

Good catch, I edited it a bit before pasting to here

If you fix a bug in the code, you need to re-run the practrand test. The practrand failure likely was because of the bug.

To see a real correlation with identical seeds one of many possibilities is this:

    STATET state1 = seed;
    STATET state2 = seed;
    STATET inc1   = 1          - seed * (MULTIPLIER - 1);
    STATET inc2   = MULTIPLIER - seed * (MULTIPLIER - 1);
    while(1) {
        for (int i = 0; i < 2048; i++) {
            buff[2 * i    ] = pcg_dxsm(&state1, inc1);
            buff[2 * i + 1] = pcg_dxsm(&state2, inc2);
        }

        fwrite(buff, sizeof(uint32_t), 4096, stdout);
    }
thynson commented 6 months ago

@michaelni is right. I first seed them with 0, and then it fails the PractRand badly. Later I edited the code a bit to seed them from a timestamp but made a mistake and result in it fails PractRand too, which is mistake.

imneme commented 6 months ago

No one who uses PCG in a sane way ever has to worry about these issues, as you need carefully contrived related seeds to show any issue, and if you're in the business of contriving carefully related seeding pairs, you can claim issues for pretty much all PRNGs (assuming you're allowed to set the complete PRNG state through seeding).

I recommend folks read this 2017 blog post of mine on the topic of PCG's streams. In that post I describe what's going on when you change the LCG additive constant, and how you can make LCG outputs that only differ by one bit if you contrive the constants just right.

(There's also the thread on the Numpy GitHub.)

Overall, all PRNGs exist in a space of trade offs. We could work harder to scramble the LCG output so that even if we have a pair of LCGs have outputs that only differ by just one bit, the final permuted output would still be radically different, but that scrambling costs CPU cycles. Since you only see an issue with carefully contrived related choices for the stream, it's not something that is worth trying to address with stronger scrambling.

One thing the PCG paper invites readers to do is design their own permutation functions. There's basically a cookbook showing you techniques you can apply and rules to follow (as it turns out before I ever wrote the PCG paper, Bob Jenkins provided similar advice, which boils down to the idea that mixing must be reversible). So if my permutations aren't strong enough for you, feel free to make your own. If you can permute more strongly and just as fast as the existing PCG permutations, feel free to share a pull request.

michaelni commented 6 months ago

First id like to clarify that i intended to report the issues i found in a somewhat more coherent way, but i didnt find the time yet. And someone who wanted to help opened this issue. This is not what or how i planed to report things ;)

No one who uses PCG in a sane way ever has to worry about these issues,

I think this statement is a little dangerous. Because "sane way" is not very well defined. But iam certainly interrested in how to initialize PCG-DXSM in a sane way if i want only non correlated streams?

I was told the seed needs to be the same, Now for each given fixed seed theres a wide range of differing increment pairs that show correlations.

as you need carefully contrived related seeds to show any issue, and if you're in the business of contriving carefully related seeding pairs, you can claim issues for pretty much all PRNGs (assuming you're allowed to set the complete PRNG state through seeding).

I recommend folks read this 2017 blog post of mine on the topic of PCG's streams. In that post I describe what's going on when you change the LCG additive constant, and how you can make LCG outputs that only differ by one bit if you contrive the constants just right.

Pairs that differ by one bit in the LCG output are a tiny subset of cases that show correlations

What i found concerning is that it seems not well understood how much of the space in PCG-DXSM are correlated streams. And thats what i tried to look into a bit, by looking at what kinds of correlations there are, and in that respect, i found it interresting every time i found a new class of correlations. I have not deeply looked into how much this is a practical issue with randomly selected seeds and increments. but it seems there are in excess of 2^64 correlated streams for each randomly picked stream in pcg64-dxsm. Which one can see as a lot or as little depending on point of view

Thanks

imneme commented 6 months ago

The right thing to do (for any PRNG, not just PCG) is use random entropy to initialize PRNG state (in PCG's case, that's current LCG-state value and LCG-increment). That's it, that's all. If you're not doing that, you're almost certainly doing it wrong, again for almost any PRNG.

For SplitMix, for example, (which is a permuted counter-based PRNG), if we just switch the stream (a.k.a. its increment or “gamma” as they call it) to the wrong (hand-chosen) value, we may end up with the same generator going backwards, or going forwards at half or double speed, which will obviously create correlations. The standard SplitMix implementation make such errors harder by usually using its own internal state to choose the stream when you split, but it's not hard to make it do the wrong thing if we put our minds to it.

So what's going on with PCG's streams is similar to some other generators of the past. As another example, you'd find similar correlations in Marsaglia's XorWow for different settings of the counter variable. XorWow is worse, because it doesn't do any permutation at all after adding in its counter.

In PCG's case, a way to think of the stream selected is that in effect we're choosing a different output permutation from the a base PRNG. Generally speaking, if you pick two output permutations at random, they'll look very different even when the underlying base PRNGs are as perfectly lined up as they can be, but if you are somehow weirdly unlucky, they might not be that different.

Likewise, for the underlying LCG sequence, your randomly-chosen starting points will likely be quite different from each other and not at all aligned. But it's possible you could be catastrophically unlucky and pick two nearby starting points and overlap.

But if you pick a random starting point and a random “stream”, you're you've made it even more unlikely that somehow choose wrong. And the other thing is that these streams are incredibly cheap, they cost almost nothing in terms of execution time—LCG or MCG we're talking about 1.25 ns per 64-bit random number (with an AMD Ryzen 9 7900X but pretty similar on current Apple CPUs) before doing anything clever like using vector instructions.

(I think this is all pretty well covered by that 2017 blog post.)

michaelni commented 6 months ago

The right thing to do (for any PRNG, not just PCG) is use random entropy to initialize PRNG state (in PCG's case, that's current LCG-state value and LCG-increment). That's it, that's all. If you're not doing that, you're almost certainly doing it wrong, again for almost any PRNG.

Ok, so if we randomly initialize a PRNG with 128+127 bits (that being seed and increment/stream selector) and draw lets say a billion values from it. How often can we do this before our set contains a correlated pair of streams ? With a ideal/perfect PRNG I think we would roughly need sqrt(2^(128-30) * 2^127) ~ 2^112 (simply the birthday paradox hitting overlapping sequences) for the 64bit output 128 bit state case and 2^48 for the 32bit output / 64bit state case. What can we expect to be able to draw out of a PCG-DXSM here before we see correlations? Iam asking this because its a more clear way to quantify the amount of correlations and because I have not seen this stated anywhere and ATM i have a feeling below what value these are but no proof. (I dont want to waste my time trying to proof correlations ocuring in a random set when thats already known/expected, OTOH if its not known thats a interesting exercise)

imneme commented 6 months ago

First, let's just think about how much compute that is in this best case. Let's assume 1ns per random number, that's (2^112 2^30) ns. And if you want to store all the results, it's (2^112 2^30 * 2^64) bits, if we use one atom of silicon per bit somehow, that's going to be about a million stellar masses worth to store them all.

Let's suppose that somehow PCG DXSM were only half that, with you only needing to collect 2^56 separate outputs. Now you just need to devote 2.5 billion years of compute to your problem, and your one-atom-per-bit computer is just going to need 600 quadrillion tons of silicon, if you can somehow manage 1 atom per bit. There's enough silicon in Earth's crust to do that, but obviously the number goes up if you need more than one atom. Then you're going to run your algorithm to hunt for the correlations. Let's hope it's somehow that's O(n).

But then we have another interesting question. If we've got that much storage and that much compute, what would happen if we took other PRNGs and looked for (possibly weak) correlations in their outputs. What would we find? As I recall, some evidence during the Numpy discussion showed that (some) LFSRs can also be correlated even when not exactly overlapping.

You also mention using DXSM for smaller-state PRNGs. But that would be a bit silly. When the main parts of your state fit in just one 64-bit register, it's much easier to usr RXSM. Part of the point of the 128-bit versions is to pick a permutation that is a bit weaker because it's good enough and cheaper.

Finally, let's switch gears, and imagine we have someone who uses some awesome PRNG, and decides to pass the output of time() to seed their PRNG (perhaps through std::seed_seq to fill out all the state). And let's suppose they run their code on everyone's smart phone. Or as another scenario, let's suppose a researcher plucks a “favorite” number out of their mind for the seed for their research. What does the distribution of those seeds look like compared to, say, all possible seedings? How often to these choices occur when people write software? How many times is everyone using the exact same numbers, even when they had a perfect PRNG. Or what if they just made a single call to std::random_device and seed with a 32-bit number?

michaelni commented 6 months ago

Let's suppose that somehow PCG DXSM were only half that, with you only needing to collect 2^56 separate outputs. Now you just need to devote 2.5 billion years of compute to your problem, and your one-atom-per-bit computer is just going to need 600 quadrillion tons of silicon, if you can somehow manage 1 atom per bit. There's enough silicon in Earth's crust to do that, but obviously the number goes up if you need more than one atom. Then you're going to run your algorithm to hunt for the correlations. Let's hope it's somehow that's O(n).

ultimately a pencil and a sheet of paper would do. What the goal is, is to find some bounds on the correlated sequences. And i think this is useful as this PRNG is used by default in a few places. I think users should know if they should expect a correlated sequence in 2^112 or in 2^80 random picks. People are also listing the period of the PRNGs and not how much silicon it would need to store the whole period.

But then we have another interesting question. If we've got that much storage and that much compute, what would happen if we took other PRNGs and looked for (possibly weak) correlations in their outputs. What would we find? As I recall, some evidence during the Numpy discussion showed that (some) LFSRs can also be correlated even when not exactly overlapping.

Theres a large number of weak PRNGs and it takes effort to analyze each specific PRNG. (because my galaxy sized computer collapsed into a black hole so i cannot brute force compute these correlations) For plain LFSRs theres obviously linear dependencies, like the mersene twister. Personally i use SFC64 nowadays for non security uses but i spend little time analyzing it. I only wrote some code to compute its state from its output

Finally, let's switch gears, and imagine we have someone who uses some awesome PRNG, and decides to pass the output of time() to seed their PRNG (perhaps through std::seed_seq to fill out all the state). And let's suppose they run their code on everyone's smart phone. Or as another scenario, let's suppose a researcher plucks a “favorite” number out of their mind for the seed for their research. What does the distribution of those seeds look like compared to, say, all possible seedings? How often to these choices occur when people write software? How many times is everyone using the exact same numbers, even when they had a perfect PRNG. Or what if they just made a single call to std::random_device and seed with a 32-bit number?

People do interesting things, my bank encrypted my data sent over email with my birthday as password for example. I tried telling them that all living people dont have a whole lot of different brithdays but they never replied.

lemire commented 6 months ago

I think users should know if they should expect a correlated sequence in 2^112 or in 2^80 random picks.

@imneme's point was that the exploit you found was not something that people had to worry about (No one who uses PCG in a sane way ever has to worry about these issues, as you need carefully contrived related seeds to show any issue, and if you're in the business of contriving carefully related seeding pairs, you can claim issues for pretty much all PRNGs).

I am curious: do you disagree with her point?

michaelni commented 6 months ago

@lemire, I have not opened this issue. What i intended, was to once i finish my investigation of the stream correlations to report it. It was just some reader of my blog who opened this issue (and iam certain he just wanted to help but it caused a bit of confusion, i think). The code posted here does not represent very well what i found btw

I was quite surprised by the amount of correlations in PCG-DXSM, and i do not believe that all PRNGs have such issues. I mean i can pick 2 seeds at random and find many increments that result in correlated sequences. I can pick a seed and an increment and generate as many correlated seed,increment pairs as i could store. I can definitely not do that with some other PRNGs.

The thing people need to worry about, is if they generate seeds and increments in some non random way.

What i will do, is look more into this as my time permits and ill report things i find. Only after that, can i say if theres anything people need to worry about. This is "work in progress"

lemire commented 6 months ago

@michaelni That's a fair answer and I think you are saying "maybe it matters".

imneme commented 6 months ago

I think users should know if they should expect a correlated sequence in 2^112 or in 2^80 random picks

I think more insight into how things work is overall a good thing, and investigating these kinds of questions is interesting, or even fascinating. So I think it’s cool to see folks get interested in PRNGs and explore, and there are numerous rabbit holes to head down.

But if I were making a list of things typical PRNG users “should know”, that nuance would be be pretty far down the list. As I mentioned earlier, far higher on my list would be having them understand that a single 32-bit number is inadequate to seed any PRNG, as would understanding you can’t just use rng() % range. I’d also love to see more folks think about whether they’re doing the right thing when they try to generate a random float in the range [0,1).

One other thing is that while someone using the Frontier supercomputer possibly does need to think about potential issues when you draw more than 2^80 random numbers (since their machine might consume 2^60 numbers per second), most folks are never going to be in that situation, so it’d be a moot concern. Alas, that mootness might not be apparent—in my experience, people often don’t always come to sensible conclusions when given a possible issue to worry about.

To give some other examples, some folks are fearful of randomized algorithms because they worry that infinitesimal probabilities “might happen”. Told that there’s a 3.8 × 10^-66 chance that random-pivot quicksort might use more than 100 stack slots when sorting 1000 numbers”, some folks will say “oh, so there is a chance” when in practice this kind of probability equates to “don’t worry about it, it’ll never happen”.

Closer to home, I saw some fear-based thinking in your blog post where you wrote:

JSF64 is omitted because its minimal cycle length is unknown, and that's like using code that very likely will work instead of knowing it will work

I recommend you read the analysis of JSF in my blog post Random Invertible Mapping Statistics. In particular, I show there would only be a short cycle in jsf32 if a 281 trillion to one chance occurrence happened in the design of the generator (not each time you use it, a one-time chance when it was designed). For jsf64, if we assume you'll grab 2^64 numbers from each seed, the odds that he included a short cycle is 340 billion billion billion billion to one. I'm incredibly confident that Bob Jenkins wasn't that unlucky.

So to my eye, you rejected a perfectly good generator that has withstood some of the most targeted testing of any out there. (Fun fact, according to David Blackman, a fair amount of effort in PRNG testing was done just to try to “bust this little PRNG”.)

In short, often people can get obsessed with worrying about vanishingly unlikely things, while at the same time being blind to issues that might have real impact.

But stepping back, the high-order bit is that I’m pleased to see your work, and I look forward to seeing what you find. PCG has always been considered as a family with the idea that new family members can be added. If it turns out that some audiences want even stronger scrambling, we can see what we can come up with. We always know that we could resort to applying a maximal-avalanche hash function, but the question is always whether we can get what we need with less compute than that, because the big concern for many people is how much it costs to generate a number.

michaelni commented 6 months ago

To give some other examples, some folks are fearful of randomized algorithms because they worry that infinitesimal probabilities “might happen”. Told that there’s a 3.8 × 10^-66 chance that random-pivot quicksort might use more than 100 stack slots when sorting 1000 numbers”, some folks will say “oh, so there is a chance” when in practice this kind of probability equates to “don’t worry about it, it’ll never happen”.

I dont know if this fear is baseless. Theres CVE-2017-1000373 about openbsd quicksort for example.

JSF64 is omitted because its minimal cycle length is unknown, and that's like using code that very likely will work instead of knowing it will work

Ive removed all specific mentions of omitted PRNGs in that article and replaced them by a generic and more concise text. I think that reduces some unintended ways of reading this

I recommend you read the analysis of JSF in my blog post Random Invertible Mapping Statistics. In particular, I show there would only be a short cycle in jsf32 if a 281 trillion to one chance occurrence happened in the design of the generator (not each time you use it, a one-time chance when it was designed). For jsf64, if we assume you'll grab 2^64 numbers from each seed, the odds that he included a short cycle is 340 billion billion billion billion to one. I'm incredibly confident that Bob Jenkins wasn't that unlucky.

I dont know what the definition of a random invertible mapping is. I would imagine thats a list of 2²⁵⁶ distinct 256bit values in this case filled with values from a true random number source. Iam not sure if a simple PRNG sufficiently matches this definition Also, either way its still not a proof of its minimal cycle length. I do like math and while there are many things that do not have proofs. For me having a proof or knowing there is one is probably in itself something i favor.

So to my eye, you rejected a perfectly good generator that has withstood some of the most targeted testing of any out there. (Fun fact, according to David Blackman, a fair amount of effort in PRNG testing was done just to try to “bust this little PRNG”.)

Iam sure there are many great PRNGs i did not list. Because i dont know them or have not had the time to investigate them or maybe even because i spend too much time investigating them :)

imneme commented 6 months ago

I dont know if this fear is baseless. Theres CVE-2017-1000373 about openbsd quicksort for example.

Perhaps you glanced over that CVE too quickly—it is specifically about a non-randomized quicksort.

I dont know what the definition of a random invertible mapping is.

First, a random mapping is another name for a (randomly constructed) endofunction, but typically viewed through the lens of a directed pseudoforrest where we see the function as a graph and consider questions like number of cycles, distance to first cycle, node in-degree, etc. These are discussed in the paper Random Mapping Statistics and the same authors' book, Analytic Combinatorics (section VII.3.3 pages 462-467).

As in all discussions of functions, the word invertible means “the function has an inverse”. In other words, a random invertible mapping is another name for a bijection/permutation, but again seeing it from more of a graph perspective, an each vertex has in-degree and out degree of exactly one. For a PRNG, it is a graph connecting next/previous state.

But as it turns out, although my Google-fu had failed me when I wrote that article (causing me to just derive things myself), it didn't fail me tonight—turns out that I should have searched for Random Permutation Statistics (duh!).

For the analysis of JSF, we just need the fact that “A random permutation of length at least m contains on average 1/m cycles of length m” (or an equivalent result) for it to be trivial to derive my claim that of “Number of nodes on cycles < 2^k is 2^k”.

Hopefully that strengthens your sense of proof.

Iam sure there are many great PRNGs i did not list.

JSF is worth knowing about, not least because Bob Jenkins has such so much good stuff on his delightfully “old-school” website. Although his his pages on PRNGs are most relevant to this conversation, the page helping people grasp orders of magnitude and polynomial vs exponential is also good stuff. In particular, you might like his October 2007 talk on PRNG design where he says “no nice math”. Specifically, his slides say “nice math causes long range patterns”, so that might resonate with you, and perhaps leave you wondering how much self-similarity there might be in other PRNGs just waiting to be uncovered.

michaelni commented 6 months ago

I dont know what the definition of a random invertible mapping is.

First, a random mapping is another name for a (randomly constructed) endofunction, but typically viewed through the lens of a directed pseudoforrest where we see the function as a graph and consider questions like number of cycles, distance to first cycle, node in-degree, etc. These are discussed in the paper Random Mapping Statistics and the same authors' book, Analytic Combinatorics (section VII.3.3 pages 462-467).

That looks interesting. If i find the time ill read these

For the analysis of JSF, we just need the fact that “A random permutation of length at least m contains on average 1/m cycles of length m” (or an equivalent result) for it to be trivial to derive my claim that of “Number of nodes on cycles < 2^k is 2^k”.

Hopefully that strengthens your sense of proof.

The problem i have is that a human written PRNG is not a random sample from all possible "random invertible mappings". I can proof my point by simply pointing to a LFSR, LCG, the simple identity function or others. None of them follow the claimed probability particularly closely. So why would this work for jsf?

michaelni commented 6 months ago

Back on the topic of PCG-DXSM, i have finished (or i hope i have) my investigation of the correlations. And summarized them on my blog: https://guru.multimedia.cx/correlations-in-randomly-initialized-pcg-dxsm/ I hope this is useful and iam not annoying anyone, which was not my intend. Also iam not sure if this issue here is the correct place to report this but as the discussion started here, it seemed best for me to follow up here.

In summary, there are many correlation and it seems to me possible this could affect specific use cases on supercomputers or otherwise anything needing really alot of random numbers. Also it seems there are correlations within the 2^128 period of a single stream that one could hit if one cloned a stream and moved one of the 2 by a multiple of a large power of 2.

Of course i do hope i made a mistake and none of this affects anyone or my code was just buggy. But i think the issue is real

imneme commented 6 months ago

The problem i have is that a human written PRNG is not a random sample from all possible "random invertible mappings". I can proof my point by simply pointing to a LFSR, LCG, the simple identity function or others. None of them follow the claimed probability particularly closely. So why would this work for jsf?

You raise a good point, but I think this is also why Bob Jenkins had the rule “no nice math”. The problem is that nice math does indeed introduce regularities, which is why you want the state-to-state transformation not to be more chaotic.

But, FWIW, as I mentioned in the PCG paper, the usual way to study this is to build smaller-scale versions of your PRNG and see what they do. If your small-scale versions pretty much always match theory, then you can expect larger-scale ones do the same. This implementation I made has jsf16 and jsf8. You can fully explore the state space of jsf8 as it's only 2^32.

imneme commented 6 months ago

Back on the topic of PCG-DXSM, i have finished (or i hope i have) my investigation of the correlations. And summarized them on my blog: https://guru.multimedia.cx/correlations-in-randomly-initialized-pcg-dxsm/ I hope this is useful and iam not annoying anyone, which was not my intend. Also iam not sure if this issue here is the correct place to report this but as the discussion started here, it seemed best for me to follow up here.

In summary, there are many correlation and it seems to me possible this could affect specific use cases on supercomputers or otherwise anything needing really alot of random numbers. Also it seems there are correlations within the 2^128 period of a single stream that one could hit if one cloned a stream and moved one of the 2 by a multiple of a large power of 2.

Of course i do hope i made a mistake and none of this affects anyone or my code was just buggy. But i think the issue is real

Thanks for your efforts on this. One thing that's a bit odd in your article is that your recommend XSM64, but that is described in the PracRand docs as follows:

The underlieing state transition function is a simple linear 64 bit (for xsm32) or 128 bit (for xsm64) LCG with very poor constants (to maximize speed). The LCG uses a seed-dependent additive constant in order to expand the statespace. The poor quality of the state transition function is made up for by a relatively complex non-linear output function. It looks like the state transition function ought to be too poor to be completely fixed by the output function, and like the the statespace expansion trick ought to produce excessively correlated states, but I have not managed to find any such issues in practice.
Update: found interstate correlation issues and in some cases bad seeds in PractRand 0.93, changing these algorithms in PractRand 0.94+ to eliminate the problems.

To my eye, XSM64 is also a permuted [linear] congruential generator, just a stronger output function. (FWIW, I was ignorant of PractRand and its XSM generators when I wrote the PCG paper—most of the academic stuff I found was focused on TestU01.)

A reality is for any PRNG with only 128-bits of changing state, theory says you'll be able to observe a statistical anomaly after only “only” 2^64 128-bit outputs, well within the reach of our supercomputer. You just use the birthday test, and you don't even need to store all 2^64 outputs, just compute them. Either you'll hit a failure to generate an expected repeat (which is a statistical issue) or your generator is not uniform (also a problem). It literally has to be one or the other. (You can make 128-bit outputs by pairing two 64-bit outputs X and Y to make XY and YX).

If you check the literature (e.g., various articles by L'Ecuyer), in general people say don't use more than the square root of the period numbers from any PRNG. In practice you can get away with more, but it's always going to be situational—if you give me a supercomputer and any 2^128 period PRNG, I'll happily fail it.

As I think I've said, it's always all about trade-offs. You can have more changing state or stronger output permutations, but it costs compute. What gives you most bang for compute buck is always something to be considering, especially as it changes over time. And if you do need to generator more than 2^64 random numbers, you should probably do some homework on those trade-offs.

michaelni commented 6 months ago

A reality is for any PRNG with only 128-bits of changing state, theory says you'll be able to observe a statistical anomaly after only “only” 2^64 128-bit outputs, well within the reach of our supercomputer. You just use the birthday test, and you don't even need to store all 2^64 outputs, just compute them. Either you'll hit a failure to generate an expected repeat (which is a statistical issue) or your generator is not uniform (also a problem). It literally has to be one or the other. (You can make 128-bit outputs by pairing two 64-bit outputs X and Y to make XY and YX).

If you check the literature (e.g., various articles by L'Ecuyer), in general people say don't use more than the square root of the period numbers from any PRNG. In practice you can get away with more, but it's always going to be situational—if you give me a supercomputer and any 2^128 period PRNG, I'll happily fail it.

First id like to clarify, iam not really recommanding any specific PRNG as a solution here. But XSM64 has 256 bit state, 128 bit seed. And if we believe the source code:

//no two seeds are closer than 2**127 from each other on the same cycle

So i dont think a sqrt(seedspace) birthday attack will work

michaelni commented 6 months ago

To my eye, XSM64 is also a permuted [linear] congruential generator, just a stronger output function. (FWIW, I was ignorant of PractRand and its XSM generators when I wrote the PCG paper—most of the academic stuff I found was focused on TestU01.)

XSM64 seems using (1<<64) + 1 as multiplier in the LCG which in an implementation with 64bit variables simplifies to a simple single addition. So it is a LCG, yes. XSM64 uses the LCG state before and after the LCG (that is different from PCG) also it uses the increment in the mixer which makes the attack i used not so trivial. because the 2 increments for the 2 LCGs give us the offset between the LCGs so to pick a bad case with bad offset we have specific increments (be that from a birthday style attack or other) and in PCG this feeds the mixer with a trivial offset. But in XSM64 the increments get mixed in on top that seem to thwart that attack. Because with XSM64 one would have to find "bad" increments and a "bad" offset but as they depend on each other the number of such cases likely is smaller. So to me it seems almost as if XSM64 was designed with this attack in mind and thats why i was a bit fascinated by it

Also i think xsm64 placing complexity in the mixer and not in the LCG multiplier is a wise choice. I really think xsm64 should be looked at before creating a new LCG+mixer