Closed peteroupc closed 2 years ago
@dhardy, @vigna, @lcrocker, @dj-on-github , can you review the random number generation articles I discuss in this issue, and answer the comments I give in this issue's opening post, if possible? If you know anyone else familiar with RNGs, can you tell them about my articles?
Eh, it's... complicated. All this stuff is complicated. I think your guidelines are fine, but, for example, you discuss random seeding, and random seeding without a minimum suggestion of independence of pairs of subsequences and long period is dangerous. LCGs/MCGs with power-of-2 moduli are known to be very bad in this respect, as states with the same k lower bits generate sequences with he same k lower bits, yet you do not appear to make any difference between MCGs (or MRGs, like MRG32k3a) with prime modulus and LCGs with power-of-2 moduli, and put power-of-2 LCGs and scrambled variants like PCG in the same class of, say, linear generators, for which we have evidence of some independence.
I think you should define formally your requirements for "high-quality PRNGs" and start from there. But if you plan to use multiple sequences from the same PRNG you cannot enlist algorithms that are known to have massive internal correlation. There's a reason why MRG32k3a is based on a multiple multiplicative recurrence with prime moduli...
Thank you for the review.
I think you should define formally your requirements for "high-quality PRNGs" and start from there.
Note that in earlier versions of the RNG recommendations document, only a PRNG that "either satisfies the collision resistance property or is significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of BigCrush" was considered high-quality; the current version now speaks of a period at or close to the maximum possible, rather than passing BigCrush.
Also, do you have answers to the questions I give in the opening post here?
Er... "collision resistant" is a property of hash functions, not on PRNG.
Well, that's not a review, just something odd I noticed...
My RNG recommendations article was updated, especially the sections "High-Quality PRNG Examples" and "Seed Generation for Noncryptographic PRNGs".
Do you have answers to the questions I give in this issue's opening post?
Still, I'm totally confused.
You mention a large period (i.e., close to the number of possible seeds) as a property of high-quality PRNGs, yet you list Jenkin's jsf, which has fixed points (seeds which will give the constant sequence as output, that is, period 1). In particular, there is no guarantee that starting from a given seed the period will be large. Similarly, SFC has a period of at least 2^64 (because of the counter), but the state is 256 bits, so they do not seem to satisfy your requirements.
To answer your questions: simulation often uses pseudorandom numbers at a speed that is comparable with a fast generator, so a crypto generator would slow down the computation. Billions of IoT devices run on batteries and when they have to generate a pseudorandom number they prefer by far an algorithm with a small number of instructions (and corresponding lesser power consumption).
The same is true of memory usage: languages like Erlang or Go in which you can easily create million of language-native threads are very conscious of the size of the PRNG of each thread (e.g., 128 bits of state might be acceptable, 256 bits of state might be not). Reasonably fast crypto generators use a lot of internal state to vectorize operations, so they might not be acceptable.
Before you wrote your comment, I recently updated the requirements for high-quality PRNGs and eliminated any references to "state length"; rather the concept that replaces "state length" is the number of different seeds the PRNG admits. (By "seeds" I mean non-degenerate seeds.)
Current text:
'has a period (maximum size of a "random" number cycle) at or close to the number of different seeds the PRNG admits'
...and I'm out of this discussion 🤷🏻♂️.
BTW, I have to make you notice again that "collision resistance" does not make any sense for a PRNG.
I use collision resistance here to refer to PRNGs that internally act like hash functions (akin to MurmurHash or cryptographic hashes) rather than mere mixing functions. Is the term "avalanche property" better here?
The "High-Quality PRNG Examples" section was reorganized into a table and moved into the "Guidelines for New RNG APIs" section. It has five columns:
Additional PRNGs can be added here if they meet the requirements for high-quality PRNGs in the article.
Also, the High-Quality PRNG requirements no longer speak of "collision resistance".
OK, this looks much much better! Nice work.
Corrections:
Published a new document, Testing PRNGs for High-Quality Randomness, and an update to "Random Number Generator Recommendations in Applications".
Adding @imneme, letting @lemire know.
The Mersenne Twister (as any other linear generator) can jump ahead by any number of steps using a jump polynomial. If you precompute the polynomial, the jump happens in logarithmic time.
As it turns out, information on the exact steps to precompute the jump polynomial is surprisingly hard to find, and there are also certain subtleties involved.
For example, I haven't found a way to calculate the characteristic polynomial of a binary matrix in the field F2, and I am not aware of what the exact characteristic polynomial is for the xorshift, xoroshiro, or xoshiro families of PRNGs.
That's why I started "Notes on Calculating Jump Parameters for Some PRNGs" today, in an effort to bring together all the steps needed to calculate a jump polynomial for any F2-linear PRNG. Please review it.
Ahem...
"Convert the characteristic polynomial to an integer (CP), so that its least significant bit is the 0-order coefficient, its next bit is the 1st-order coefficient, and so on."
Why would you ever want to do that?
The jump thing is clearly explained in my paper
https://doi.org/10.1016/j.cam.2016.11.006
I don't think that putting it into a text file without proper mathematical notation is helping...
"Convert the characteristic polynomial to an integer (CP), so that its least significant bit is the 0-order coefficient, its next bit is the 1st-order coefficient, and so on."
Why would you ever want to do that?
Some APIs don't necessarily return the characteristic polynomial as an integer (which is a more convenient form for the rest of the process I describe). For example, SymPy's charpoly
outputs a polynomial symbol, and each of its terms and coefficients is likewise a symbol.
I am in the process of computing characteristic polynomials and jump polynomials at multiple points for the xoroshiro and xoshiro families, and will publish them in due course.
(Sorry, I edited your comment by accident, rather than adding a new reply. Restored.)
I'm sorry, but you're making a gigantic confusion. Nobody "converts" a polynomial to an integer. A polynomial is a polynomial—it is an abstract algebraic object defined by certain universal properties. We represent it as a sequence of bits, packed into a few integers, for convenience. The computations on a polynomial happen in the ring of polynomials. There are no integers involved.
The "Notes on Jumping PRNGs Ahead" (as it's now called) has been modified to no longer speak of "converting" a polynomial to an integer.
I have updated the starting post of this issue. Please look at it again, where I give a list of requests and open questions about my articles, especially my articles on exact random sampling.
Letting them know: @maciej-bendkowski, @christoph-conrads .
I have updated the starting post of this issue. Please look at it again, where I give a list of requests and open questions about my articles, especially my articles on exact random sampling.
Of the articles mentioned here, Randomization and Sampling Methods and More Random Sampling Methods combined are very long (about 230 KB in size combined). I would like to reduce their size while maintaining the most relevant algorithms for random variate generation.
Here are my goals for both articles:
The requests and open questions for all my articles are now in a dedicated page: Requests and Open Questions.
@PaulSanchez : Since you are very familiar with simulation and random variate generation, especially on Stack Overflow, do you have comments on my Randomization and Sampling Methods article or my other articles on randomness in my GitHub repository here?
@lmendo : Thanks for responding, though you must have deleted the comment right afterwards. Did you see the page with the requests and open questions? If so, do you have responses for those questions?
@peteroupc I deleted my comment because I realized your message was for Paul J. Sánchez, not for me. Anyway, what I said in it was that those two pages are a lot of material for me to read... I have just taken a look at the requests for Bernoulli factories and I can't suggest anything off the top of my head. I'll let you know if something comes to mind
@peteroupc Regarding your comment on simulating concave Lipschitz continuous functions (which I can't find on GitHub now): I couldn't follow your reasoning, but if the result holds I think it's worth publishing in a journal. Do you just prove that there exists a simulation algorithm for that case, or do you also explicitly obtain the algorithm?
I moved it to the appendix of "Supplemental Notes for Bernoulli Factory Algorithms", linked on the top post of this issue. The proof sketch uses a polynomial scheme that meets the requirements of Nacu and Peres 2005, so that an algorithm that uses the polynomials exists. The algorithm is implicit in those polynomials, not explicit.
I may be mistaken on that result. It didn't take into account that all values of n have to be considered in calculating E[N], not just powers of 2.
Apologies for re-awakening an old thread, I've been looking for source code that might compute jump coefficients for generators like xoshiro256 and your post is the closest thing I've found. Did you ever manage to compute any, and if so do you have any source code (in any language) for doing so?
Closing to start a new topic with a better focus.
This post lists review questions on several documents I have written.
The requests and open questions for all my articles are now in a dedicated page: Requests and Open Questions.
Besides the requests and open questions, comments on any aspect of the documents listed below are welcome, including corrections to any information in them.
Articles on Random Number Generation, Randomization, Exact Sampling, etc.
Random Number Generators
Randomization and Exact Sampling
Color Topics for Programmers: