Closed thouis closed 5 years ago
Attachment added by trac user ilan on 2010-04-10: ziggurat.patch
@charris wrote on 2010-04-10
We've had better ziggurat methods than those in Doornik for some years now, but there has been little push in getting them into numpy.
Hi there.
I've completed timing and robustness tests that should
allow apples to apples comparisons of the mtrand normal distribution
possibilities. The short story goes like this:
0) One of the 'Harris' ziggurats, zig7, passes all TestU01 crush tests
when running atop each of five PRNGS.
1) Depending on how fast the underlying PRNG is, zig7
outperforms Box_Muller by ~1.3X to ~8.5X.
2) using gcc, mwc8222+zig7 outperforms mt19937+Box_Muller by 7X.
See you both at SciPy.
Bruce
And here are the details:
In what follows the uniform variate generators are:
lcg64
mwc8222
mt19937
mt19937_64
yarn5
And the normal distribution codes are:
trng - default normal distribution code in TRNG
boxm - Box-Muller, mtrand lookalike, remembers/uses 2nd value
zig7 - a 'Harris' ziggurat indexed by 7 bits
zig8 - a 'Harris' ziggurat indexed by 8 bits
zig9 - a 'Harris' ziggurat indexed by 9 bits
Here are the numbers in more detail:
# Timings from icc -O2 running on 2.4GhZ Core-2
lcg64 trng: 6.52459e+06 ops per second
lcg64 boxm: 2.18453e+07 ops per second
lcg64 zig7: 1.80616e+08 ops per second
lcg64 zig8: 2.01865e+08 ops per second
lcg64 zig9: 2.05156e+08 ops per second
mwc8222 trng: 6.52459e+06 ops per second
mwc8222 boxm: 2.08787e+07 ops per second
mwc8222 zig7: 9.44663e+07 ops per second
mwc8222 zig8: 1.05326e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.41112e+06 ops per second
mt19937 boxm: 1.64986e+07 ops per second
mt19937 zig7: 4.23762e+07 ops per second
mt19937 zig8: 4.52623e+07 ops per second
mt19937 zig9: 4.52623e+07 ops per second
mt19937_64 trng: 6.42509e+06 ops per second
mt19937_64 boxm: 1.93226e+07 ops per second
mt19937_64 zig7: 5.8762e+07 ops per second
mt19937_64 zig8: 6.17213e+07 ops per second
mt19937_64 zig9: 6.29146e+07 ops per second
yarn5 trng: 5.95781e+06 ops per second
yarn5 boxm: 1.19156e+07 ops per second
yarn5 zig7: 1.48945e+07 ops per second
yarn5 zig8: 1.54809e+07 ops per second
yarn5 zig9: 1.53201e+07 ops per second
# Timings from g++ -O2 running on a 2.4GhZ Core-2
lcg64 trng: 6.72163e+06 ops per second
lcg64 boxm: 1.50465e+07 ops per second
lcg64 zig7: 1.31072e+08 ops per second
lcg64 zig8: 1.48383e+08 ops per second
lcg64 zig9: 1.6036e+08 ops per second
mwc8222 trng: 6.64215e+06 ops per second
mwc8222 boxm: 1.44299e+07 ops per second
mwc8222 zig7: 8.903e+07 ops per second
mwc8222 zig8: 1.00825e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.52459e+06 ops per second
mt19937 boxm: 1.28223e+07 ops per second
mt19937 zig7: 5.00116e+07 ops per second
mt19937 zig8: 5.41123e+07 ops per second
mt19937 zig9: 5.47083e+07 ops per second
mt19937_64 trng: 6.58285e+06 ops per second
mt19937_64 boxm: 1.42988e+07 ops per second
mt19937_64 zig7: 6.72164e+07 ops per second
mt19937_64 zig8: 7.39591e+07 ops per second
mt19937_64 zig9: 7.46022e+07 ops per second
yarn5 trng: 6.25144e+06 ops per second
yarn5 boxm: 8.93672e+06 ops per second
yarn5 zig7: 1.50465e+07 ops per second
yarn5 zig8: 1.57496e+07 ops per second
yarn5 zig9: 1.56038e+07 ops per second
You should contact Bruce Carneal bcarneal@gmail.com and see if you can get his code. It is in c++ but can probably be translated to c without too much trouble. If we are going the ziggurat direction, it would be nice to implement the exponential distribution also. As you can see, the overall speedup also depends on the underlying rng, so if you can figure out a way of making choices available there that will also help. Of course, for small sets of values call overhead will dominate everything.
Thanks for looking into this problem, all it needs is for someone to take it in hand and push it through.
trac user ilan wrote on 2010-04-10
Thanks for your quick response. Yes, the speedup depends on the underlying RNG, and the patch I propose is just using what is already there. The method is quite simple and already a more than 2 times speedup. My felling is that by using faster underlying RNGs, a lot more could be done, but this is really a separate issue.
Writing the Ziggurat method more general, such that exponential (and possibly other) distributions can be done with the same code, might be possible but I don't think its worth the effort, as the code is very small, and also the tail distribution is different in other cases.
@charris wrote on 2010-04-10
The 'Harris' ziggurat is mine and the code is actually simpler than the Marsaglia version (Doornik). I also did the exponential (as did Marsaglia), which is even easier. Notice that the Marsaglia version actually uses an exponential to bound the tail. So do I, but I make it part of the ziggurat, which simplifies matters. It takes maybe ten lines of python to compute the needed constants for the various ziggurats. Note that in general you can get about 4x speedup vs Box-Muller for the MT rng.
@charris wrote on 2010-04-10
A few additional comments.
1) The random variable
u = 2.0 * rk_double(state) - 1.0
isn't evenly distributed between positive and negative. I don't think that makes any difference in practice given the other roundoff errors, but is worth noting because the steps in the ziggurat are symmetric about zero. Accepting the roundoff errors, it can be fixed by shifting and scaling the ziggurat layer to [0, 1-eps], using the positive double, and shifting and scaling the return value. This also avoids the call to fabs and saves one floating multiplication.
2)
/* first try the rectangular boxes */
if (fabs(u) < s_adZigR[i])
return u * s_adZigX[i];
uses a double in the comparison. One of the virtues of the original ziggurat was using integers here, which is most likely faster even on modern hardware.
3) It would be nice to move the initialization to load time so that there is no need to check it for every call. That isn't likely to make much difference given the overhead of the call to the rng but it looks a bit out of place.
4) The function zig_NormalTail can be removed if an improved ziggurat is used.
5) You can avoid the occasional extra call to rk_random if you handle the double conversion yourself. There are enough bits left over from the double to deal with the ziggurat layers. Having the pre-conversion integer form will also allow you to use integer comparisons in 2).
Chuck
IIRC, this was in the Enthought edition for a while, but messed up results since variates were not produced in the same order as the current algorithm. Adding something like this either needs a new name, or a method argument for the normal generators.
@thouis @charris
Is this patch still active? NumPy's random number generation performance is somewhat behind other choices (Julia or MATLAB, around a factor of 4), and this seems like a simple method to improve the performance for Gaussians which are both directly interesting and important inputs for random numbers from other distributions.
No, but we could take it up again. The current generator needs to be kept for backward compatibility, so a new one would need a new name.
It seems overzealous to require backward compatibility for the specific numbers sampled by the random number generators. Everything else equal of course it's better, but if sampling can be sped up 2x or 4x then there is a tradeoff.
@argriffing it's not unimportant for reproducibility of unit tests using random number generators (same reason that a seed needs to be used in such tests).
If anyone really wants the speedup, it's pretty easy to satisfy both impulses. Save the old generator as a subclass, and write a function that takes a numpy version number and fetches a RandomState instance with the corresponding implementations. Then people who need to reproduce old results have an easy way to do it, and people who don't care can get the fastest version. On 19 Jul 2014 15:29, "Ralf Gommers" notifications@github.com wrote:
@argriffing https://github.com/argriffing it's not unimportant for reproducibility of unit tests using random number generators (same reason that a seed needs to be used in such tests).
— Reply to this email directly or view it on GitHub https://github.com/numpy/numpy/issues/2047#issuecomment-49510993.
We do require that the same sequences be generated by the same functions, it is even tested. Reproducibility requires that. But there is no reason not to add new functions, say zignormal
. Bruce disappeared after all that work he did to test and verify the algorithms. I still have my code sitting around somewhere, although it is pretty ancient now. IIRC, there was a small bug that Bruce found, overflow or some such, which I think I fixed in my code. Hmm, I've still got all the emails from the summer of 2008. Looks like Bruce also had an SIMD version which sped things up even more.
PS: As before, the scaling, relative to a single serial thread, is 16X (4 wide SIMD X 4 cores).
So if we really wanted fast, that would be one way to go. Multiple cores may make preserving the sequence difficult. Super high performance can probably be pushed off to a specialized package.
I also don't like changing random state for some speed up (no matter how awesome). Now personally, I doubt that will happen, but I used to write scripts with a fixed seed for reproducibility. Some of that reproducibility is gone anyway because even tiny float rounding changes will break it, but still. I don't mind having a method keyword. I also don't mind adding logic such as "If the seed was fixed, warn the user of a future change and use the old version, otherwise just use the new version", then keep the old version around and after a while make the faster one default. And maybe right away or a little later remove the warning.
Bruce is on Linkedin. I never got any of my code back, so it may be worth contacting him first if we decide to pursue this.
I can see an argument either way for making improvements opt-in vs opt-out. But I do feel strongly that
Bruce is on Linkedin. I never got any of my code back, so it may be worth contacting him first if we decide to pursue this.
— Reply to this email directly or view it on GitHub https://github.com/numpy/numpy/issues/2047#issuecomment-49511454.
Looks like Bruce also had an SIMD version which sped things up even more.
I woudl assume this is SFMT, http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html, which is a SIMD implementation of the standard MT (same state). This is orthogonal to the Ziggurat change which only affects normal generation. There is another generator by the same authors (dSFMT) that specializes in doubles that is potentially even faster for uniforms with double precision - it is used by Julia.
It would seem to me the first step would be to get the zignormal
into Numpy as a standard alone normal generator (ditto zigexp
) that is exposed. Then users who are performance sensitive could start making use of these. Once these have been used, it might make sense to make these the default RNGs for normals and exponentials.
I am sympathetic to the above points that for most users, they will want the fastest method to generate a random variable from any distribution F, and when this requires a normal (e.g. a Students's t), then it would be preferable to use zignormal
I also don't like changing random state for some speed up (no matter how awesome). Now personally, I doubt that will happen, but I used to write scripts with a fixed seed for reproducibility. Some of that reproducibility is gone anyway because even tiny float rounding changes will break it, but still. I don't mind having a method keyword. I also don't mind adding logic such as "If the seed was fixed, warn the user of a future change and use the old version, otherwise just use the new version", then keep the old version around and after a while make the faster one default. And maybe right away or a little later remove the warning.
Moreover, if exact reproducibility is really crucial, someone can always setup a virtualenv with python=2.7 numpy=1.8
and then they should get the same thing. Requiring rng reproducibility across major versions is tantamount to agreeing never to change the default core RNG.
many people do care about reproducibility, we even get bug reports that our MT implementation gives different results than matlab and R for 64 bit numbers from time to time. how about a keyword argument to normal?
fwiw in numpy 1.9 the random module can now be used from multiple threads via the threading module, but you do need a state per thread as each state still has a lock.
Just for fun it would be great to add this to the rng wishlist
http://www.deshawresearch.com/resources_random123.html
It can be used to generate many independent streams (2**64
) in parallel.
many people do care about reproducibility, we even get bug reports that our MT implementation gives different results than matlab and R for 64 bit numbers from time to time.
There are definitely some differences - MATLAB, for example, produces random integers with a range up to 2**53
, which indicates they are using a double uniform generator and then converting to integer.
the base algorithms are often the same, 32 bit mersenne twister, the differences come how those 32 bit numbers are converted to 64 bit numbers.
but actually dropping reproducibility for stuff like normal could be more acceptable as the current box muller method does imply using libc math functions were one has no control over how accurate they are. So we are already only providing approximate reproducibility for these.
Requiring rng reproducibility across major versions is tantamount to agreeing never to change the default core RNG.
And indeed, that is what we have agreed to. We can add new core PRNGs, but not change the core PRNG of RandomState
. I'm not sure I can be made to care too much about the convenience aliases at the numpy.random
level.
My new favorite candidate for a new core PRNG: http://www.pcg-random.org/
PCG is definitely sweet. Random123 is probably also worth a look-in: http://www.deshawresearch.com/resources_random123.html
(Cribbed from the thread at JuliaLang/julia#5105 that github helpfully auto-linked above. They also have some other links to recent papers on optimizing RNGs.)
I have started some initial work trying to produce a usable implementation of pcg at https://github.com/bashtage/pcg-python.
Right not it has some basic RNGs implemented, as well as the ability to produce multiple streams and advance (skip to point in rng chain). It a little faster (25-50%) than the NumPy rngs, although I don't do as much (e.g., return only 1-d array). I'm no expert at C/Cython integration, so I would imagine there are some things I could do to eek out some more performance (e.g., doing more directly in C).
The numpy mtrand code seems complex to me, and I'd welcome any collaboration.
@bashtage Excellent! I have a C implementation of pcg64 with a software implementation of uint128 arithmetic for those platforms that need it.
I think it's worth playing around with the interface that you have now to work out the API for keyed streams and the like before investing the energy to make the rest of the RandomState
infrastructure deal with it; that's gonna take a lot of work, unfortunately. One possible path forward is to copy over the randomkit
and distributions
C files over to your project and work on making that part of it switch between the Mersenne Twister and the PCG variants. We can probably iterate on that part faster in your project than trying to do the integration in numpy which has more moving parts. But once we get that solved, it will be easier to migrate it over into numpy where we can focus on the rest of the moving parts (i.e. mtrand.pyx
itself).
@rkern I've done some initial experiments with the pcg64 and the performance is a little disappointing. For example 1,000,000 uniforms takes about 5.7ms with pcg32 and 5.0 with pcg64. When using the NumPy Guassian generation method this difference is completely lost. So from a performance POV the pcg64 doesn't add much (all tests suing gcc 4.8.1, so YMMV). pcg64 does have a longer period and can produce more streams, so still might be worth it (also probably better to just go to the better generator).
Looking through mtrand.pxy, randomkit, distributions, etc, this seems like excessively complex code. I know it was written a long time ago (from what I can tell mostly < 2005) so this isn't a criticism. In an ideal work, the random generation code would be refactored so that it would accept random uint32/uint64s from an arbitrary generator. This would allow further generators to be added with much less pain. This is also unfortunately outside of my competency as there are things in mtrand.pyx that I don't really understand.
I also noticed there are a lot of functions in distributions.c that don't really need to be written in C (e.g. a Cauchy rng, which is just he ratio of two normals). From a maintenance POV it might make sense to have a layered approach, from lowest level to highest
vendor-supplied rng code (e.g. pcg-*.c) shim to provide a consistent interface core random number generators (i.e., rngs with loops) cython file with class and non-core rngs
Is your modified pcg64 on github?
FWIW the fundamental reason I am interested in this is not performance but the need (which I think is pretty essential) for a real multi-stream/advance-able RNG for parallel applications.
My main motivation for pcg64 is the period, not speed; I'm surprised that it's faster at all. The typical advice is that one should not draw more numbers than sqrt(period)
. So for pcg32, sqrt(2**64) ~ 5 billion
, which is plausible for the more demanding use cases, which we have to design for when we choose a general purpose default. Also, it's nice to only do one draw to get a double
.
The approach I recommended above is along the lines of what you are suggesting and should help avoid the complexity of mtrand.pyx
per se until we have to. Modify rk_state
to be that shim of a consistent C-level interface. Make it work both with a pcg* and with the Mersenne Twister as the underlying core PRNG. Make the C distribution functions work with that generic rk_state
interface. Ignore mtrand.pyx
for now.
I still need to add the copyright mumbo-jumbo to my implementation of pcg64, then I'll stick it up on Github.
My main motivation for pcg64 is the period, not speed; I'm surprised that it's faster at all. The typical advice is that one should not draw more numbers than sqrt(period). So for pcg32, sqrt(2**64) ~ 5 billion, which is plausible for the more demanding use cases, which we have to design for when we choose a general purpose default.
This said, the effective period is more like 2**127
since you can have 2**63
(independent-ish) streams. Of course, this requires some understanding of the RNG to avoid making mistakes.
Also, it's nice to only do one draw to get a double.
I am guessing this is where the small speed increases come in.
I now have a generic structure that allows alternative core RNGs to be used. Basically uses the structure:
These are all combined to produce rng-specific externsion (right now it outputs 4).
Comments welcome at this point.
Some performance numbers:
Time to produce 1,000,000 normals
****************************************
dummy_inv_standard_normal 51.90 ms
dummy_zig_standard_normal 15.44 ms
numpy.random_inv_standard_normal 68.71 ms
pcg32_inv_standard_normal 61.30 ms
pcg32_zig_standard_normal 26.00 ms
pcg64_inv_standard_normal 59.90 ms
pcg64_zig_standard_normal 23.01 ms
randomkit_inv_standard_normal 66.53 ms
randomkit_zig_standard_normal 31.66 ms
dtype: object
Normals per second
****************************************
dummy_inv_standard_normal 19.27 million
dummy_zig_standard_normal 64.75 million
numpy.random_inv_standard_normal 14.55 million
pcg32_inv_standard_normal 16.31 million
pcg32_zig_standard_normal 38.46 million
pcg64_inv_standard_normal 16.69 million
pcg64_zig_standard_normal 43.47 million
randomkit_inv_standard_normal 15.03 million
randomkit_zig_standard_normal 31.58 million
dtype: object
Speed-up relative to NumPy
****************************************
dummy_inv_standard_normal 32.4%
dummy_zig_standard_normal 344.9%
pcg32_inv_standard_normal 12.1%
pcg32_zig_standard_normal 164.2%
pcg64_inv_standard_normal 14.7%
pcg64_zig_standard_normal 198.7%
randomkit_inv_standard_normal 3.3%
randomkit_zig_standard_normal 117.0%
Dummy is a trivial rng that repeats the same 20 numbers. Just used to test another hypothetical RNG.
Sounds like you're making some great progress! . I think that eventually, the API we're going to want to expose for alternative core RNGs will be something like: For each core RNG, there is a single Python class -- e.g. np.random.MersenneTwister, np.random.PCG64, etc. Instances of these classes hold the RNG state, and expose some basic methods for initialization, generation and getting/setting state (probably only at the C/Cython level -- we'll want this to keep most of this API private for now). The RandomState object then is changed so that it HAS-A core RNG object that it stores as an instance variable, and we replace all the current calls to global rng functions with calls to methods on this instance variable. Common end-user usage would look like:
r = RandomState()
r = RandomState(seed=123)
r = RandomState(rng=np.random.PCG64(seed=123))
r = RandomState(seed=123, rng="PCG64")
On Tue, Dec 15, 2015 at 4:04 PM, Kevin Sheppard notifications@github.com wrote:
I now have a generic structure that allows alternative core RNGs to be used. Basically uses the structure:
- core rng files
- shim that provides a small number of functions, but esp random uint32/64 and state structure, as well as a pxi file.
- generic distributions file
- generic pyx file + includes and conditional compilation
These are all combined to produce rng-specific externsion (right now it outputs 4).
Comments welcome at this point.
Some performance numbers:
Time to produce 1,000,000 normals
dummy_inv_standard_normal 51.90 ms dummy_zig_standard_normal 15.44 ms numpy.random_inv_standard_normal 68.71 ms pcg32_inv_standard_normal 61.30 ms pcg32_zig_standard_normal 26.00 ms pcg64_inv_standard_normal 59.90 ms pcg64_zig_standard_normal 23.01 ms randomkit_inv_standard_normal 66.53 ms randomkit_zig_standard_normal 31.66 ms dtype: object
Normals per second
dummy_inv_standard_normal 19.27 million dummy_zig_standard_normal 64.75 million numpy.random_inv_standard_normal 14.55 million pcg32_inv_standard_normal 16.31 million pcg32_zig_standard_normal 38.46 million pcg64_inv_standard_normal 16.69 million pcg64_zig_standard_normal 43.47 million randomkit_inv_standard_normal 15.03 million randomkit_zig_standard_normal 31.58 million dtype: object
Speed-up relative to NumPy
dummy_inv_standard_normal 32.4% dummy_zig_standard_normal 344.9% pcg32_inv_standard_normal 12.1% pcg32_zig_standard_normal 164.2% pcg64_inv_standard_normal 14.7% pcg64_zig_standard_normal 198.7% randomkit_inv_standard_normal 3.3% randomkit_zig_standard_normal 117.0%
Dummy is a trivial rng that repeats the same 20 numbers. Just used to test another hypothetical RNG.
— Reply to this email directly or view it on GitHub https://github.com/numpy/numpy/issues/2047#issuecomment-164937719.
Nathaniel J. Smith -- http://vorpus.org
My objections to the HAS-A API still hold: https://github.com/numpy/numpy/issues/5299#issuecomment-97764637
Responding to your comment there:
The core generators that I am most interested in adding are those that can be parameterized, but each one in different ways. Making the argument signature dependent on the value of one of the arguments is an antipattern I would like to avoid.
Yes, this is one of the advantages of the HAS-A api -- each core generator class can get its own __init__
signature. In my proposal we still accept a seed
argument in RandomState.__init__
for backcompat, but this would just be a shorthand for the real api where users can instantiate the core generator object however they want.
I am inclined against having the distribution implementations change by a configuration parameter, but I could be persuaded. Not least, having this mechanism be orthogonal to the selection of the core generator would be ideal.
I guess we've converged now on using an instance parameter to change distribution implementations, and exactly the idea here is that having the implementations and the core generator as separate variables makes them orthogonal -- plus it makes it possible to change the default generator using the same logic as we use for changing the default level. (If no generator or seed is specified, we want to default to the best/fastest generator -- but backcompat means that this is only possible to do if we're also able to switch back to MT if the user later calls seed
.)
In my proposal we still accept a seed argument in
RandomState.__init__
for backcompat, but this would just be a shorthand for the real api where users can instantiate the core generator object however they want.
Please don't include the shorthand, then.
(If no generator or seed is specified, we want to default to the best/fastest generator -- but backcompat means that this is only possible to do if we're also able to switch back to MT if the user later calls seed.)
seed()
silently switching algorithms is a really confusing behavior! Please don't do it. I would suggest simply leaving RandomState
alone at the Python API level (under the Cython hood, you can do what you like). I would object less to a HAS-A API under a new name that does not have backwards-compatibility requirements.
There is nothing sacred about the name RandomState
. It's not even a very good name for what it does.
Please don't include the shorthand, then.
We're not going to break backcompat, and this is hardly complicated...:
class RandomState:
def __init__(self, seed=None, rng=None):
if rng is None:
rng = DefaultRNG(seed)
elif seed is not None:
raise ValueError("Can't specify seed if rng object is given")
...
The question of whether we should also a string-based shorthand for well-known RNG objects is a little more complicated and I'm not sure whether or not it's a good idea myself; I'm happy to defer that until we get more into the details.
seed()
silently switching algorithms is a really confusing behavior! Please don't do it.
Before, seed set the state of the RNG used for future calls. Now, it would set the state+algorithm used for future calls. What's confusing about that? The version
code already has seed
potentially changing the algorithm used for complex distributions, which is substantially harder to explain (but it doesn't much matter, because users don't have to care so long as streams are reproducible).
The alternative is that we could just continue defaulting to the old-and-slow-and-lower-quality algorithms forever, but that seems like a worse outcome to me...
Oh, hmm, also: @bashtage: could you open a new issue for the work on replacing the core generator? It's orthogonal to the ziggurat issue which this issue is nominally about...
Ah, the shorthand that you proposed r = RandomState(seed=123, rng="PCG64")
does not follow the semantics that you then sketched out. It's this that I object to, because the initialization of the new core PRNGs of interest take various forms, not just a single seed number/array. For example, the most desired new feature of the PCG family is that it has selectable streams.
The interface that I would prefer is one in which I grab the class that I want and use it rather than grabbing two classes and combining them. It's simple and well proven by the standard library.
from numpy.random import PCG64
prng = PCG64(seed=1234, stream=5678)
x = prng.weibull(...)
prng.jumpahead(10000)
y = prng.normal(...)
# Restart.
prng.seed(1234)
assert_array_equal(x, prng.weibull(...))
Compared to:
from numpy.random import RandomState, PCG64
prng = RandomState(rng=PCG64(seed=1234, stream=5678))
x = prng.weibull(...)
prng.rng.jumpahead(10000)
y = prng.normal(...)
# Restart.
prng.seed(1234) # Wait, we're now back to using Mersenne Twister?!
assert_array_equal(x, prng.weibull(...)) # AssertionError
The standard approach has the benefit of being manifestly backwards compatible. RandomState
will always be the Mersenne Twister. We would refactor its guts to use a HAS-A implementation at the C/Cython level, but that doesn't need to show up at the Python API level to make us go into contortions to preserve backwards compatibility.
I have started an issue to look at multiple PRNGs: #6967
@charris Why is backward compatibility an issue for variate generation? The documentation never promised that the same variates would be generated across versions.
@NeilGirdhar It was to allow repeatablility. Changing that is under discussion.
@NeilGirdhar It's documented here ("Compatibility Guarantee"): https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState
@rkern Oh, I see, my mistake.
xref #13163 . This is fixed in the randomgen branch.
The randomgen branch now supports external BitGenerators, and supports PCG64 natively although there is some discussion about removing some of the BitGenerators. Closing, thanks for providing some of the impetus to make this change.
Please reopen if you feel there are pieces of this missing in the new API.
Original ticket http://projects.scipy.org/numpy/ticket/1450 on 2010-04-10 by trac user ilan, assigned to @charris.
I've written a patch which replaces the Box-Muller transform in numpy/random/mtrand/randomkit.c with the faster Ziggurat method. The patch also includes updates to doc-strings which include the relevant references:
My patch is based on the support code which J. Doornik has made available on his web page (first reference). I've contacted him, and he has given permission to include his code in numpy on the condition that his paper is referenced (which the patch does).
The patch was written against http://svn.scipy.org/svn/numpy/tags/1.4.0, and I've just checked to make sure it can still be applied to the trunk (revision 8324). I've tested the patch on Windows, MacOSX, Linux and Solaris.
The speedup is a bit over 2 times, when creating a large number of Standard Normal distributed random numbers. To verify that the method works and produces good random numbers, I have:
Feel free to hammer the patch. I hope this patch will make its way into numpy soon.