unify random generation across Oscar

oscar-system / Oscar.jl

A comprehensive open source computer algebra system for computations in algebra, geometry, and number theory.

https://www.oscar-system.org

Other

338 stars 120 forks source link

unify random generation across Oscar #100

Open rfourquet opened 4 years ago

rfourquet commented 4 years ago

Having one RNG usable in all subsystems of Oscar would be useful.

Cf. discussion started at https://github.com/oscar-system/Oscar.jl/pull/84#issuecomment-621624338. To quote the (current) last comment of it, from @fieker:

Bottom line: for now, we cannot do anything (no manpower). Long-term, it needs to be sorted, as, at least for debugging and development, reproducibility is important, thus using >>1 independent sources of random that are totally wild is difficult to support long term

I suggest to have the discussion continue here.

wbhart commented 4 years ago

What does it mean, "unify random generation". This is like saying we should have one function for all number theory. We are embarrassing ourselves.

We (currently) have two RNG's in Flint (and a number of randomisation strategies built on top of them):

1) A pair of LCG's for very fast (but still high quality) pseudorandom generation (when appropriately combined as per Marsaglia). Two seeds are required to make use of this.

2) The GMP Mersenne twister for generation of large random values. It has a reasonably low amortised cost, but has a fairly large internal state.

The approach we have taken thus far is about the most minimally stupid approach you can get away with. All our random functions have warnings on them that they are only useful for test code and not to be used for anything serious. Even that is only after making a number of stupid mistakes over the years that caused hangs or very poor test coverage due to us not knowing what we were doing.

We also now have functions for reasonable quality uniform distribution for where that matters (though it is not clear anyone actually trusts us to have gotten that right, and nor should they).

I fail to see what you even mean by having "one RNG" usable by all systems. There are all sorts of random generators for all sorts of different purposes. They aren't in any way "compatible".

Perhaps it might be useful to start by reviewing the ways in which RNG's are used in mathematics and by reviewing some algorithms for pseudorandom generation, e.g. see the high quality but simple generators of George Marsaglia.

Our random generation sucks, by the way. It's going to need some serious work in the future. It's as slow as crap and produces poor quality pseudorandom numbers. Doing it properly is a serious job, which is why we haven't.

fieker commented 4 years ago

On Mon, May 04, 2020 at 07:16:16AM -0700, wbhart wrote:

What does it mean, "unify random generation". This is like saying we should have one function for all number theory. We are embarrassing ourselves. What I mean is that there need to be, in the end, in Oscar, ONE central function to (re)set the RNG used everywherein Oscar. Otherwise, we can never achieve reproducibility.

This is hard.

We (currently) have two RNG's in Flint (and a number of randomisation strategies built on top of them): Of course. The bane of class group computation is the strategy (plus one needs semi-decent randomness to start with)

1) A pair of LCG's for very fast (but still high quality) pseudorandom generation (when appropriately combined as per Marsaglia). Two seeds are required to make use of this.

2) The GMP Mersenne twister for generation of large random values. It has a reasonably low amortised cost, but has a fairly large internal state. This is probably functionally equivalent to julias.. while of course subtly different.

The approach we have taken thus far is about the most minimally stupid approach you can get away with. All our random functions have warnings on them that they are only useful for test code and not to be used for anything serious. Even that is only after making a number of stupid mistakes over the years that caused hangs or very poor test coverage due to us not knowing what we were doing. As I said, I am aware of this. The mixture between fast, random, startegy is lethal. Thus it should be done in as few places as possible and there, as well as possible.

We also now have functions for reasonable quality uniform distribution for where that matters (though it is not clear anyone actually trusts us to have gotten that right, and nor should they). "Matters" is difficult.

generation of finite fields

factoring over finite fields

primality testing (well composite tests)

finding primitive elements

of course, testing to name a few.

I fail to see what you even mean by having "one RNG" usable by all systems. There are all sorts of random generators for all sorts of different purposes. They aren't in any way "compatible". I mean e.g doing a long julia computation resulting in an error. I need (well: would like) to re-do the computation. It totally is uncool that we had errors to be reliably triggered by doing a computation 1000 times. Then the error would manifest itself due to bad/good random. I would like to get the state of randomness between, say each iteration to be able to trigger the error reliably in ONE run. We have some bizarre travis tests that sometimes fail - but never reproducibly. I strongly suspect: bad/good/(un)lucky randomness. The runtime on 100x computing the same class group of the identical field can vary by orders of magnitude. Impossible to analyse as I can never re-run the code, the randomness is, well, random. This is why, in the long run, we need to consolidate this. Wether we manage to get away with one random generator (doubtfula s you say, even when restricting to non-crypto use) or at least have one function to get and set the seed of all RNGs we can discuss. This is a major amount of work - and will require a long discussion and careful planning and even more careful implementation, however, unless we do this we will be achieve reproducibility. There are some areas where 'only' the runtime will differ, but the actual result is 'unique', those are easy as they are stable from this point onwards. (You might need to sort the output for normalisation) There are others where the abstract mathematical property is fixed, but nothing else is. Prototype is the relation search for any index-calculus. If you think about the e.g. cluster of class group related stuff, then

the class group is fixed

the regulator is fixed

the actual units are not, only their average size

S-units dito

ideal generators combine all of that If this is then combined with doing calculations in fields that were generated using this approach, then, even if the fields are 'the same' nothing else is.

For 'small' problems, e.g. factoring this is 'easy' and also easy to work around. For 'huge' ones, like matrix group recognition, the reproducibility matters hugely. In tables (p-groups I think), even the classification (actual labelling of the classes, the classification itself is fine) of the groups depends on the then used source of randomness.

Thus this needs to be tightly controlled. In Oscar. Nemo, Hecke, Flint, .... can do what they want, know best, ... But in Oscar, I see no way around this.

Perhaps it might be useful to start by reviewing the ways in which RNG's are used in mathematics and by reviewing some algorithms for pseudorandom generation, e.g. see the high quality but simple generators of George Marsaglia. I think if you send the output through a (fast) hash-function, you're fine. (I think - Magma ran into this)

Our random generation sucks, by the way. It's going to need some serious work in the future. It's as slow as crap and produces poor quality pseudorandom numbers. Doing it properly is a serious job, which is why we haven't. Yep, agreed (apart from the 'sucks' bit: I hav eno knowledge nor any opinion there)

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/100#issuecomment-623490668

rfourquet commented 4 years ago

What does it mean, "unify random generation". This is like saying we should have one function for all number theory. We are embarrassing ourselves.

This is not how I meant it, feel free to change the title to be more meaningful.

To build on the linked conversation, it means:

at the very least, it would be useful that if I do Random.seed!(0), or e.g. Oscar.rng_seed!(0), and use any Oscar function which uses randomness, then the result is reproducible.
Ideally, IMO, all functions in Oscar which use an RNG would accept an optional rng argument. In this way, if I don't trust the RNG provided by flint for example, I can pass my own RNG, for example a cryptographic one.

I fail to see what you even mean by having "one RNG" usable by all systems. There are all sorts of random generators for all sorts of different purposes. They aren't in any way "compatible".

In my understanding, there exist random numbers generators which are deemed good enough for general use (except for cryptography), and fast enough.

Our random generation sucks, by the way. It's going to need some serious work in the future. It's as slow as crap and produces poor quality pseudorandom numbers. Doing it properly is a serious job, which is why we haven't.

For Flint, I'm not sure what is best, but on the Julia/Oscar, I would investigate whether it's possible to use a Julia RNG as I mentioned in the other thread. I don't see a reason apriori why it wouldn't be possible (but again, I don't know the details here). Until proven otherwise, Julia's default RNG is good for general purpose (except cryptography), even number theory. [And there is a Julia PR (https://github.com/JuliaLang/julia/pull/34852) to make a version of Xoshiro the new default, which is faster and would make enable reproducibility even in the presence of multithreaded programs; but the decision is not taken yet on this].

wbhart commented 4 years ago

To have reproducibility for the user, you would have to either

1) Have a function which sets all random generators to the same seed every time you run the tests, which will certainly get rid of your occasional failing tests, but which will also make random testing essentially useless.

2) Provide the user with all seeds of all generators used, including the very big ones.

I'm telling you now that you cannot do this with one small seed and you cannot do it based on the time and you cannot do it with one RNG to rule them all.

It's a nice goal, but it's also pretty much impossible.

The best I can come up with, even in theory, is to have a single large seed (many bits, the number of which depends on the version of the random generator) which each generator uses as a source of entropy for seeding itself in a deterministic way based on a generator specific algorithm, which then gets saved to disk when the test framework has a failure (or perhaps every time the test suite is run), and then have a function which can reset all the generators based on such a saved token. Though exactly how you get the random values used for a given test is beyond me at present, as it will depend on the random values that came before it and how many of them there were. Seeding a generator so that it starts at a given point in its pseudorandom sequence is only practical for certain kinds of generators (one of many reasons there are so many approaches to random generation).

Let me just be absolutely clear about one thing: I can't make Flint accept an arbitrary RNG supplied from outside of Flint. That is absolutely impractical. Flint has to use different random generators for different purposes as things are. And this is only going to get worse as we improve things.

Once again, there is no single RNG which is good for everything. RNG's are a tradeoff between 1) various statistical measures of quality (there are many, and some matter more than others in certain contexts, e.g. patterns modulo certain small primes, randomness of various bit ranges, length of pseudorandom cycle, the likelihood that one kind of number will follow one of another kind, how normal the distribution is (sometimes not actually desirable), clustering of values in 2D, 3D and higher dimensions and many more) 2) speed to get an individual random number 3) speed to initialise the RNG 4) amortised cost of getting a random number 5) size of internal state (useful for starting at a given point in the pseudorandom sequence) 6) size of seed 7) how easy/fast uniform distribution (or other distributions) are to construct on top of them 8) (not usually relevant to us) cryptographic suitability.

And no, there are no fast RNG's that are high enough quality for all mathematical applications but which don't become the bottleneck in others. Just the other day I used Nemo to do a computation which required generating random numbers as part of the computation. Random generation was by far the bottleneck (by orders of magnitude). So whatever we are doing currently is absolutely awful.

We currently don't even try to solve the reproducibility thing in Flint at present. In BSDNT I think we "solved" it by always starting with the same seeds. The random numbers change only as you add new tests. (Don't quote me on it, but that's what I recall.) But it's suboptimal because you add a new test function and someone else's code breaks and you get blamed for it.

I strongly suggest we focus on things we can actually fix before "fixing" those we'd need to do a lot of research and absolutely huge quantities of implementation to solve. And in the mean time, we should not make big claims about what is essentially amateur hour.

fieker commented 4 years ago

On Mon, May 04, 2020 at 08:05:58AM -0700, Rafael Fourquet wrote:

What does it mean, "unify random generation". This is like saying we should have one function for all number theory. We are embarrassing ourselves.

This is not how I meant it, feel free to change the title to be more meaningful.

To build on the linked conversation, it means:

at the very least, it would be useful that if I do Random.seed!(0), or e.g. Oscar.rng_seed!(0), and use any Oscar function which uses randomness, then the result is reproducible.

Ideally, IMO, all functions in Oscar which use an RNG would accept an optional rng argument. In this way, if I don't trust the RNG provided by flint for example, I can pass my own RNG, for example a cryptographic one. no can do.... In general, you do not know where this is used

factoring, primality

creation of (large) finite fields

any function that does anything of the above users cannot be expected to understand the call stack caused, its too complex The best would be a oscar global variable...

I fail to see what you even mean by having "one RNG" usable by all systems. There are all sorts of random generators for all sorts of different purposes. They aren't in any way "compatible".

In my understanding, there exist random numbers generators which are deemed good enough for general use (except for cryptography), and fast enough.

Our random generation sucks, by the way. It's going to need some serious work in the future. It's as slow as crap and produces poor quality pseudorandom numbers. Doing it properly is a serious job, which is why we haven't.

For Flint, I'm not sure what is best, but on the Julia/Oscar, I would investigate whether it's possible to use a Julia RNG as I mentioned in the other thread. I don't see a reason apriori why it wouldn't be possible (but again, I don't know the details here). Until proven otherwise, Julia's default RNG is good for general purpose (except cryptography), even number theory. [And there is a Julia PR (https://github.com/JuliaLang/julia/pull/34852) to make a version of Xoshiro the new default, which is faster and would make enable reproducibility even in the presence of multithreaded programs; but the decision is not taken yet on this].

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/100#issuecomment-623519323

wbhart commented 4 years ago

Having said all that, I do vaguely recall there was some trick you could do with a sha digest of a portion of some known fixed pseudorandom sequence into which you could index to get some kind of reproducibility. I'm sure there are papers on it. That might allow us to solve the reproducibility issue if we can dig it up.

fieker commented 4 years ago

On Mon, May 04, 2020 at 08:41:55AM -0700, wbhart wrote:

To have reproducibility for the user, you would have to either

1) Have a function which sets all random generators to the same seed every time you run the tests, which will certainly get rid of your occasional failing tests, but which will also make random testing essentially useless. Nope, wgat you do is at the beginning of the test, print the seed, so I can re-run the tests with the same seed

2) Provide the user with all seeds of all generators used, including the very big ones. Yep - or, in Oscar, make it cascading: one small seed for a small/ simple RNG, to produce a large seed for the complicated one.

I'm telling you now that you cannot do this with one small seed and you cannot do it based on the time and you cannot do it with one RNG to rule them all. Well, other systems manage... however, we have a handicap that we started divided

I suggest we stop the discussion for now as it is pointless. I kow it can be done as e.g. Magma does this, solving the same problems as we do. You know it cannot be done due to various conflicting goals. We agree it is complicated and time-consuming, so now is not the best time for it.

Let's put it into the future (the discussion, the implementation or lack thereof)

It's a nice goal, but it's also pretty much impossible.

The best I can come up with, even in theory, is to have a single large seed (many bits, the number of which depends on the version of the random generator) which each generator uses as a source of entropy for seeding itself in a deterministic way based on a generator specific algorithm, which then gets saved to disk when the test framework has a failure (or perhaps every time the test suite is run), and then have a function which can reset all the generators based on such a saved token. Though exactly how you get the random values used for a given test is beyond me at present, as it will depend on the random values that came before it and how many of them there were. Seeding a generator so that it starts at a given point in its pseudorandom sequence is only practical for certain kinds of generators (one of many reasons there are so many approaches to random generation).

Let me just be absolutely clear about one thing: I can't make Flint accept an arbitrary RNG supplied from outside of Flint. That is absolutely impractical. Flint has to use different random generators for different purposes as things are. And this is only going to get worse as we improve things.

Once again, there is no single RNG which is good for everything. RNG's are a tradeoff between 1) various statistical measures of quality (there are many, and some matter more than others in certain contexts, e.g. patterns modulo certain small primes, randomness of various bit ranges, length of pseudorandom cycle, the likelihood that one kind of number will follow one of another kind, how normal the distribution is (sometimes not actually desirable), clustering of values in 2D, 3D and higher dimensions and many more) 2) speed to get an individual random number 3) speed to initialise the RNG 4) amortised cost of getting a random number 5) size of internal state (useful for starting at a given point in the pseudorandom sequence) 6) size of seed 7) how easy/fast uniform distribution (or other distributions) are to construct on top of them 8) (not usually relevant to us) cryptographic suitability.

And no, there are no fast RNG's that are high enough quality for all mathematical applications but which don't become the bottleneck in others. Just the other day I used Nemo to do a computation which required generating random numbers as part of the computation. Random generation was by far the bottleneck (by orders of magnitude). So whatever we are doing currently is absolutely awful.

We currently don't even try to solve the reproducibility thing in Flint at present. In BSDNT I think we "solved" it by always starting with the same seeds. The random numbers change only as you add new tests. (Don't quote me on it, but that's what I recall.) But it's suboptimal because you add a new test function and someone else's code breaks and you get blamed for it.

I strongly suggest we focus on things we can actually fix before "fixing" those we'd need to do a lot of research and absolutely huge quantities of implementation to solve. And in the mean time, we should not make big claims about what is essentially amateur hour.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/100#issuecomment-623540621

wbhart commented 4 years ago

In this way, if I don't trust the RNG provided by flint for example, I can pass my own RNG, for example a cryptographic one.

I just want to point out that just because a crytopgraphic RNG was used as a source of randomness, does not mean that an application is "secure". Here is a function which takes a single word of your cryptographically secure RNG to generate two words of randomness:

void rand_one_to_two(ulong out1, ulong out2, ulong crypto_rand_in) { out1 = crypto_rand_in; out2 = crypto_rand_in ^ 3; }

There are actual real world examples of where stupid things like this have been done. Under no circumstances should a non-secure system like a CAS be used for anything secure, with or without a secure RNG!

On the flip side, there are these days algorithms for taking a number of poor random generators and combining them to make good ones (though I never followed up on whether they passed peer review, and I don't know how they work - but I do know they are very slow as all things crypto essentially have to be).

wbhart commented 4 years ago

2) Provide the user with all seeds of all generators used, including the very big ones. Yep - or, in Oscar, make it cascading: one small seed for a small/ simple RNG, to produce a large seed for the complicated one.

Sigh. What is the point of having an RNG with a large seed if the system seeds it with a small one!? There's a reason RNG's with a large seed use a large seed!

RNGs are like theorems in mathematics. You can't use proofs in contexts where you start with things that violate the assumptions on which those proofs are built. And if you can't prove anything, you shouldn't make any claims whatsoever about your system. In particular, if you break the RNG's used by Oscar we should not make the claim that Oscar is useful for investigating conjectures in mathematics, or that it is well-tested, etc.

rfourquet commented 4 years ago

I just want to point out that just because a crytopgraphic RNG was used as a source of randomness, does not mean that an application is "secure".

Sure, I have enough crypto background to know that. My example was a bit extreme, the suggested idea was that with a crypto RNG, I'm (more) sure that I'm not using a bad PRNG that could be screwing my results.

wbhart commented 4 years ago

@fieker Try to imagine a large supercomputer computation to collect statistics about a mathematical conjecture. Flint, by the way, uses two 64 bit seeds to generate 64 bit random numbers, because there are no known fast generators that are easy to implement which can do it with a single 64 bit seed. And that would not be suitable for such a supercomputer computation if you were serious about it. It might not even be good enough for a moderate sized department cluster if run long enough.

The only thing a small seed is good for is randomised test code. If you are doing something serious other than test code and you want to re-run your exact computation if it breaks, you are going to need something much better.

wbhart commented 4 years ago

I think I remember how the sha digest thing works now. I think it only works if you have a single PRNG for a session. If you change PRNG in a session, or use multiple PRNG's I don't see how to make it work. (But I think the test code in bsdnt used a globally set PRNG, from a selection of possibilities.)

Before each test you print a sha digest of the internal state of the PRNG. This works for something like bsdnt because we were using the same seed every session. To reproduce a test you run the given PRNG from the initial seed until the sha matches that given before the failing test, and then you can restart that given test with the same internal state of the PRNG. In this way the size of the string presented to the user is independent of the PRNG used and independent of the size of its internal state or seed. This is useful for test code if you are using a PRNG with large internal state such as the Mersenne Twister.

This is of limited usefulness for us, but is a reasonable trick to know about.

wbhart commented 4 years ago

And you don't need to print the entire SHA digest. Just the first word is sufficient for test code.

Of course this is only useful if all functions use the globally set PRNG, which was fine for bsdnt, not so much for a system stitched together from multiple parts. Whether there is a trick for that, I simply do not know. I think not, as any of the PRNG's could have run for any number of iterations and the combinatorial explosion guarantees you can't reconstruct the state from a digest in a short time.

Another hard problem to solve, which we didn't make much progress with is reproducibility across architectures (e.g. 32 vs 64 bits). We thought about it, but I don't think we ever implemented anything along those lines.

fieker commented 4 years ago

Apart from that we really shoudl postphone this discussion....

There are different problems:

I run your big statistical analysis, then my interaction with Flint/Nemo/... is through me directly calling a RNG. At this point, I know (or not) what I am doing, my problem
I need to factor a thousand polynomials for a test, or have an application that needs to factor many nmod_polys Here, life is different: for the test, I would like to have reproducibility, not cryptographical security or guaranteed statistical properties. It needs to have a good spread and, for debugging purposes, deterministic and reproducible Factoring nmod_poly: if the result is sorted, deterministically, I might be interested in reproduciblity for debugging, but the result is fixed. Relation search, matrix group recognition: the input question does not indicate any random-ness, the final output does not, mathematically, bu in between I really want control and reproducibility. I need a good spread, and good statistics, to 'guarantee' the runtime. I want reproducibility to debug and develop.

So: yes, Oscar needs to have (through flint or otherwise) a range of sophisticated RNGs for different purposes. As many as you want, as many seeds as you want. Internally, for questions that do not include random as a inpt (e.g. generation of finite fields, factor nmod_poly, factor integers, primality testing...) there should be, ideally, on seed to reset them all so I can debug and develop and understand users bug reports.

On Mon, May 04, 2020 at 10:06:20AM -0700, wbhart wrote:

I think I remember how the sha digest thing works now. I think it only works if you have a single PRNG for a session. If you change PRNG in a session, or use multiple PRNG's I don't see how to make it work. (But I think the test code in bsdnt used a globally set PRNG, from a selection of possibilities.)

Before each test you print a sha digest of the internal state of the PRNG. This works for something like bsdnt because we were using the same seed every session. To reproduce a test you run the given PRNG from the initial seed until the sha matches that given before the failing test, and then you can restart that given test with the same internal state of the PRNG. In this way the size of the string presented to the user is independent of the PRNG used and independent of the size of its internal state or seed. This is useful for test code if you are using a PRNG with large internal state such as the Mersenne Twister.

Magma 'solves' this differently: the twister is initialized, indirectly by s.th. small, e.g. through cascading. Then Magma keeps a count on how many times it was called: same purpose as the hash digest... This is of limited usefulness for us, but is a reasonable trick to know about.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/100#issuecomment-623586895

peteroupc commented 4 years ago

I want to note several things:

On reproducibility. Some applications, including tests, care about reproducibility. However, if there are things outside the application's control, reproducibility is often not achievable. Besides pseudorandom number generation, some things that affect reproducibility include:

Floating-point numbers (as opposed to fixed-point numbers or integers).
Undefined and undocumented behavior.
Multithreading and parallel processing.

I explain this in more detail in my section "Ensuring Reproducibility"; see also the references cited there.

The point is that it's not just the RNG that can get in the way of reproducible tests, simulations, etc. And if there is a way to avoid the need for reproducibility, rather than ensure reproducibility (and ensuring it is not exactly trivial), an application should follow that way.

On kinds of tests. Some tests, like fuzz tests, are designed to generate test cases with the help of randomness. For fuzz tests, reproducibility is generally not required unless the test case is impractical to store or distribute. If the generated test cases are stored and used in other tests (such as unit tests), it doesn't matter much what RNG was used to generate that test case, as long as it used a relatively large seed and had high statistical quality.

This is in contrast to unit tests, where randomness and nondeterministic behavior not controlled for should not determine whether the test passes or fails. Unit tests (as opposed to fuzz tests) have no need to generate test cases at random, especially if the code under test does not rely on random number generators at all (as stochastic optimization and Monte Carlo integration do, for example).

If a unit test tests code that uses randomness, it should also not be sensitive to the particular random generator the code uses. For example, rather than setting a seed for an RNG, the unit test could check whether values are acceptable within a given tolerance. By doing so, the unit test will no longer be tied to a particular RNG, so that the RNG can be changed without affecting the test or the application's functionality.

lgoettgens commented 1 year ago

cf. https://github.com/Nemocas/AbstractAlgebra.jl/issues/1388

fingolfin commented 1 year ago

I've added Oscar.randseed! in PR #2578, which can seed all RNGs we know about using an integer (i.e. a "small" seed). This does not attempt to provide a way to save & restore full states (although we could do that, if we really wanted, but I don't see much of a point in that right now)

Beyond that I don't really see anything in this issue that we can realistically do to make everything use "one rng", nor does it seem sensible to do so (because different things may need different RNGs with different properties). So I am strongly tempted to close this issue. But I'll wait at least until @fieker is back from vacation.