rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.68k stars 430 forks source link

Compromising accuracy and speed [distributions] #494

Open dhardy opened 6 years ago

dhardy commented 6 years ago

This is an issue that has been discussed in part several times already in other contexts. But I think it's now worth making an issue specifically about this to get extra feedback.


There are several cases within distribution algorithms that we've had to make decisions about speed-vs-accuracy trade-offs:

So far, other than with the suggested addition of a HighPrecision distribution (#372) (and the other float sampling methods Open01 and OpenClosed01), we have not given users any choice over this speed-vs-accuracy trade-off but have simply made the choice for users.

dhardy commented 6 years ago

Suggestion 1: we just choose defaults with reasonable precision.

Since computing power is inherently limited, sufficiently small biases are unlikely to be noticeable. But how small is good enough? Modern CPUs run up to around 2^32 cycles/second, and a year has around 2^25 seconds, for around 2^57 cycles/year/core. The top supercomputers now have in excess of a million cores, say around 2^24 with around 100 PFlop/s, so 10^14 × 3*10^7 = 3e21 (approx 2^72) floating-point operations per year.

What does this tell us? A bias as small as 1 in 2^128 is unlikely to be detectable, ever (at least without quantum computers). But a bias of 1 in 2^53 (i.e. f64) is potentially detectable even with a desktop PC — in a serious application running non-stop for months or years. On the other hand, many applications wouldn't even notice bias as large as 1 in 2^10. So perhaps a one-size-fits-all solution isn't a good choice, if high-precision comes at a significant performance cost.

Can we say anything else? Above around 1 in 2^20 bias is hard to detect without specifically looking for it, and above 2^40 doing so requires looking quite hard, hence biases this small may be good enough for most purposes, though certainly not all.

(Sorry @pitdicker for paraphrasing a lot of what you said I while back; I forget where.)

dhardy commented 6 years ago

Suggestion 2: use feature flags to control precision

If around 2^20-2^40 bits of precision is enough for most users, then what if we accept such choices as defaults, but add a high-precision feature flag to enable higher-precision algorithms?

There is a potential catch here — activation of higher-precision becomes application-wide, and if any lib in the application requires Rand's high-precision mode, then it gets used everywhere. If we implement this option we should therefore recommend that only applications enable this flag (if required); unfortunately this doesn't play well with Cargo crates bundling both libraries and applications.

Further, this is only going to be a good choice if the high-precision alternatives are still reasonably fast, since users can't pick which algorithm to use at each use-site.

This would probably not affect compilation time or code size much, since the algorithms would be behind feature flags thus unused code can be eliminated early.

Summary: keeps the API clean and simple; Rand still gets a little more complex (multiple implementations); limited control

dhardy commented 6 years ago

Suggestion 3: multiple implementations, as necessary

This is basically the current plan regarding Standard and HighPrecision (#372) — just have multiple implementations which can be selected at the use-site, e.g. rng.gen() vs rng.sample(HighPrecision::new(0.0, 1.0))

For e.g. gen_bool this would require multiple function names; alternatively it might be better to stick with only a single implementation (the current one rounds probabilities down to the next multiple of 2^-64, which is good enough for most users, though perhaps not everyone).

Summary: flexible but with extra API complexity and sometimes worse ergonomics.

dhardy commented 6 years ago

Suggestion 4: generic, over some configuration type

What if we added dummy types High and Low both implementing a Precision trait, then templated over these? I.e. Uniform::<Low, f64>::new(0.0, 5.0), Bernoulli::<High>::new(1e-30), ...

This would probably be quite bad for ergonomics (if it even worked as designed) since presumably users could not omit generic parameters any more (let range = Uniform::new_inclusive(1, 6); would become let range = Uniform::<Low, _>::new_inclusive(1, 6);).

In my opinion the poor ergonomics and extra API complexity are unacceptable; as it is now Rand is already confusing some users.

sicking commented 6 years ago

There's a variant of Suggestion 3 which is that rather than make it a property of which API is used (gen_bool() vs gen_fast_bool()), that we make it a property of the Rng object.

I.e. that something like fast_rng().gen_range(...) would use slightly biased, but faster, algorithms, whereas secure_rng().gen_range(...) would not accept any bias.

Similarly vec.shuffle(fast_rng()) would used slightly biased generation, whereas vec.shuffle(secure_rng()) would not.

We might even be able to create wrappers which wrap Rng objects and which turns on/off bias. Such that fast_rng().unbiased().gen_range(...) would produce unbiased numbers.

I don't know if there's any good way to do this which doesn't introduce runtime cost though. It'd work to stick a fn allow_bias() -> bool function on RngCore and then check that where needed. But I don't know if the optimizer is good enough at optimizing away the check at all times?

Also this approach would make fast_rng().sample(Uniform::new(1, 10)) kind of wonky. The Uniform constructor will calculate the anti-bias zone, but would sample produce biased on unbiased values?

sicking commented 6 years ago

Ultimately I think Suggestion 1 is the most reasonable. Though it scares me a little to think that someone might use a CSPRNG together with a biased algorithm...

pitdicker commented 6 years ago

That is a very good summary, @dhardy!

We seem to be in a bit of a difficult position, as Rand is used for some very different use cases:

In my opinion, we should strongly optimize for the 'common programming problems' (security does not really apply here). Is seems reasonable to expect anyone who writes multiple-month simulations to spend an hour reading docs, or know the limitations of the algorithms he is using.

I like suggestion 1 most, but would also like to see 3 when reasonable.

Specifically:

Letting the accuracy depend on the RNG as @sicking suggest is also an interesting idea. But the distinction should not be between normal PRNGs and CSPRNGs. Accuracy is not really a security-related problem. PRNGs that are suitable for simulations are also the ones that are suitable for many common problems. I would say the suitability depends more on where the function is used, than the RNG.

dhardy commented 6 years ago

I'm not sure how @sicking's suggestion could be implemented other than by adding a method like fn use_high_precision_sampling(&self) -> bool to RngCore (specialisation might allow a different approach based on whether an additional trait, HighPrecisionSamplingRng, was implemented, but not yet). Ultimately I'm not so keen on the idea anyway because as @pitdicker says, accuracy is more a use-case problem and less a security problem (though arguably enabling it globally within an application is a reasonable solution, hence suggestion 2).

We could also modify suggestion 2 by adding a low_precision feature in addition to (or instead of) the high_precision one.

Other than that, I just wanted to say that talking about Bernoulli as having 53/64-bit accuracy is a bit mis-leading: it currently supports down to 2^-64, but 53-bit (and even 24-bit) precise floats can represent far smaller positive numbers (MIN_POSITIVE is approx 10^-38 and 10^-308).

pitdicker commented 6 years ago

Just a quick way to say the accuracy lies between 53 and 64 bits :-). Is a chance smaller than, say 2^50, really a chance we see as a reasonable use?

B.t.w., I have been a bit absent the past week, but try to get back to things this evening

Op wo 6 jun. 2018 10:04 schreef Diggory Hardy notifications@github.com:

I'm not sure how @sicking https://github.com/sicking's suggestion could be implemented other than by adding a method like fn use_high_precision_sampling(&self) -> bool to RngCore (specialisation might allow a different approach based on whether an additional trait, HighPrecisionSamplingRng, was implemented, but not yet). Ultimately I'm not so keen on the idea anyway because as @pitdicker https://github.com/pitdicker says, accuracy is more a use-case problem and less a security problem (though arguably enabling it globally within an application is a reasonable solution, hence suggestion 2).

We could also modify suggestion 2 by adding a low_precision feature in addition to (or instead of) the high_precision one.

Other than that, I just wanted to say that talking about Bernoulli as having 53/64-bit accuracy is a bit mis-leading: it currently supports down to 2^-64, but 53-bit (and even 24-bit) precise floats can represent far smaller positive numbers (MIN_POSITIVE is approx 10^-38 and 10^-308).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/rand/issues/494#issuecomment-394979772, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9xynYji0sVNqAf9i0tvl_lI9YaXAOvks5t540qgaJpZM4Uabdu .

dhardy commented 6 years ago

Another thing I want to say is that I'd prefer to err on the side of caution (i.e. more precision than necessary) than on the side of performance (for default choices). That said, I think some trade-offs can still be made (e.g. the current Bernoulli should be good enough for the vast majority of users).

This is really more political than it is technical: people are far more likely to criticise incorrect and prematurely-optimised approaches than slow ones; also since in many cases performance will not be a real issue anyway, I don't think it makes sense to "over-optimise".

Because of this, I'm not happy to skip the zone checks for range sampling (e.g. sampling u8 from u32 with Uniform), even though quick tests show 30-50% better performance. If we were to sample e.g. i16 from a u64 backing range then arguably any bias would be too small to matter, but it appears faster to sample an i16 from a u32 backing range with the correct zone checks anyway, and in this case although bias is still small, it's not small enough that I'm happy to ignore it by default.

I therefore think that these range sampling optimisations (and perhaps also the 32-bit gen_bool solution) should either not be used at all, or only be enabled on an opt-in basis. There are probably some uses where the bias really doesn't matter, but in the interests of making users lives simpler I think the only good opt-in solution would be a feature flag (i.e. suggestion 2 inverted).

Good point @pitdicker that anyone writing serious simulations will likely look over the docs closely, so slightly less obvious/ergonomic solutions for people needing high precision is fine (i.e. solution 3), though suggestion 2 may still be easier.

I guess the wrap-up question is would any of these solutions make our lives easier (i.e. more obvious answers to speed-vs-quality compromises) vs harder (more code/complexity)?

No problem @pitdicker :wink:

vks commented 6 years ago

gen_bool: return to the 32-bit accuracy, and have the Bernoulli distribution for 53~64 bit accuracy. One optimized for common use, one for long-running simulations.

I'm skeptical for the following reasons:

I would like to propose a variation of suggestion 3:

Suggestion 5: split up the distributions into two modules: fast and precise

Distributions optimized for speed go to distributions::fast, if they are optimized for accuracy they go to distribtuions::precise.

Only the distributions in precise could be reexported to provide a conservative default. Users can pick explicitly which one they want to use, or just pick the default. This would also get rid of weird names like HighPrecision01, which could be just precise::ClosedOpen01. We could also provide a 32-bit based fast::Bernoulli. (I don't think there should be agen_bool using fast::Bernoulli: convenient and less safe don't mix well.)

Summary: flexible but with extra API complexity that can be safely ignored by probably most users

dhardy commented 6 years ago

Nice suggestion @vks! Two niggles:

With that in mind, perhaps rand::distributions should contain the default implementations, and rand::distributions::fast (?) contain the fast versions, re-binding all items from the main module which do not have faster equivalents.

One thing though: should we really make HighPrecision01 the default (and also a high-precision variant of Uniform, which we haven't implemented yet and may have large overhead)?

I guess we could have three variants of the module: the default, fast, and precise. Seems somewhat over-designed.

vks commented 6 years ago

the names are unwieldy

Not sure how they could be any shorter. prec instead of precise? To be honest, I would rather shorten distributions.

I think the alternative suggestions except suggestion 1 are more unwieldy.

for maximum compatibility, it would make sense if the two modules are (roughly) equivalent

Yes, it should be safe for fast to reexport precise if there is no faster implementation.

I guess we could have three variants of the module: the default, fast, and precise. Seems somewhat over-designed.

That's what I suggested. You would have to choose between distributions::Dist, distributions::fast::Dist and distributions::precise::Dist. We could decide on a case-by-case basis whether distributions::Dist is reexporting from fast or precise, but this would increase the mental burden for the user.

I don't think it is over-designed (unless providing more than one implementation is already over-designed), because it is backwards compatible and nicely allows to opt into the desired implementation.

dhardy commented 6 years ago

I mean the combined names, e.g. distributions::fast. But no, I don't have a better idea. It'll do.

We could decide on a case-by-case basis whether distributions::Dist is reexporting from fast or precise

I see. Okay, sounds like a good plan, though it leaves two things out:

sicking commented 6 years ago

But the distinction should not be between normal PRNGs and CSPRNGs. Accuracy is not really a security-related problem.

Yeah, I didn't mean to suggest otherwise. Mainly I was thinking that someone specifically requesting a cryptographically strong likely wouldn't want to use an algorithm with bias as a default. I.e. if you're generating a cryptographically strong security token for authentication, I was thinking you likely don't want more as showing up than zs. Though from a practical point of view, if we make the biases small enough, security doesn't seem affected in a meaningful way. I.e. if a 256-bit key ends up being just 255.99 bits, then that's likely fine.

But I'd want to check with someone that knows crypto better than me before being sure about that.

Suggestion 5: split up the distributions into two modules

I think the distributions are the relatively easy part to solve. Separate structs for fast vs. biased I think would make for both make for a clear and easy to implement API. If we do it by sticking the distributions in separate modules, or by using more expressive names, is not something I have an opinion about.

The bigger problem is all the other APIs. I.e. things like gen_range(), shuffle(), choose(), choose_multiple(), sample_indices(), etc. These are the ones that I was thinking we could do by using flags on the RngCore. But it's not clear to me that that's implementable. Or worth the implementation complexity.

sicking commented 6 years ago

I agree with @pitdicker that we should optimize for "common programming problems". Which I feel like it means that sufficiently small biases is ok.

Though I'm not sure what optimizing for "common programming problems" means for performance. Arguably it means that if making things faster comes at a cost of code complexity (which often is the case), we should only optimize the code if it's likely to show up in real-world profiles, rather than worry too much about squeezing everything out of our micro-benchmarks.

But I'm happy to defer to others. Especially in cases where implementation strategy doesn't affect the public API and so is something that we can easily change over time.

vks commented 6 years ago

Short-cuts like Rng::gen, gen_range and gen_bool will not be configurable; users wanting something else must switch to distributions::XYZ::Uniform etc.; I guess this is acceptable though not ideal.

They trade flexibility for convenience. Making them more flexible defeats their purpose, so I'm not sure what the alternative would be? I'm not a big fan of any of those methods TBH.

This is different for the algorithms operating on sequences (shuffle, choose etc.), which have no alternative. We could possible add shuffle_with_dist(&mut [T], &mut Rng, Fn(usize, usize) -> impl Distribution<usize>) etc., but maybe that is too flexible.

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question. Either of the alternatives @pitdicker mentioned may be "common" in certain domains.

Maybe it would be more actionable to look at typical implementations used in the popular randomness libraries that have been used a lot without anyone complaining about biases.

Please note that "long-running simulations" are not necessarily "long" in terms of real time but rather in CPU time. On most supercomputers there is an upper limit on the runtime, ranging from a day up to a week or two.

moving high-precision variants to an external lib wouldn't be so easy

Why would it be more difficult than for the others? In any case, I think they should live in the same crate. It might make sense to move the distributions to their own crate. However, at the moment they are tied to some methods of Rng. I think those could be all moved to the distributions or a module for the sequences though.

dhardy commented 6 years ago

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question.

Agreed; in part also because trying to tie down an exact level of bias which is acceptable in general is difficult.

Two things about @vks suggestion still need finalising:

pitdicker commented 6 years ago

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question. Either of the alternatives @pitdicker mentioned may be "common" in certain domains.

I don't really see it as difficult or vague. When we start to talk in terms of "this will take 1+ year of CPU-time before the tiniest bias becomes visible while doing nothing but calling this function", we are clearly outside the "common programming problems" territory.

Maybe it would be more actionable to look at typical implementations used in the popular randomness libraries that have been used a lot without anyone complaining about biases.

That is always a good idea :smile:.


I really don't think suggestion 2 is a good idea. The choice of precision really depends on how something is used. Say a bit more accuracy is wanted in one place. A feature flag would effect every other function, also in every other dependency.

I don't know what to think of suggestion 5. If we only consider Uniform, maybe Standard for floats, and maybe Bernoulli, it makes some sense. But it doesn't really for all the other distributions.

moving high-precision variants to an external lib wouldn't be so easy

Why would it be more difficult than for the others?

Seems like a good alternative to keep in mind. We already had the same discussion around PRNG algorithms. In my opinion is is a good thing for Rand to be opinionated in the functionality it includes. If we can't make good choices, or offer multiple variants if really needed, how do we expect users to choose? Distribution is a trait, there is no harm in having alternative algorithms for some distributions in another crate.

sicking commented 6 years ago

Another approach occurred to me:

Suggestion 6: Use the type of the generated data to determine precision

I noticed that rand used to have a struct Closed01(f32) (or some such) type for generating values in the [0, 1] range.

We could use a similar approach for choosing between high-performance vs. low-bias algorithms. I.e. Uniform<Fast<u32>> would generate slightly biased but higher performance numbers. And Uniform<Precise<u32>> would generate unbiased numbers, but using a slower algorithm. Where Fast looks something like struct Fast<T: ...>(T).

This would unfortunately only work for APIs like Uniform, gen, gen_range, gen_ratio, and gen_bool, which are (or could be) generic over the type of the generated numbers. It does not work for SliceRandom.choose which has no such type parameter.

I'm not sure that I particularly like this solution, but I wanted to add it for completeness.

dhardy commented 6 years ago

In some cases this "wrapping type" notation can be used easily, but to be honest I found Open01 etc. quite annoying to use a lot of the time. E.g. you might want to write this:

let Precise(x) = rng.gen();
// but the type of `x` must be specified somewhere, so you may require:
let Precise(x): Precise<f64> = rng.gen();
dhardy commented 6 years ago

I now agree with @pitdicker's view the most: that in most cases we can just pick a sensible default, and in a few we can add extra implementations (i.e. Suggestion 3).

Proposal:

This seems the simplest solution, and also provides uniform names, without the confusion of shadowed names in sub-modules (I don't like the idea of having distributions::Uniform and distributions::precise::Uniform as in suggestion 5).

Is everyone happy enough with this proposal?

vks commented 6 years ago

I don't like the idea of having distributions::Uniform and distributions::precise::Uniform as in suggestion 5

Why do you think it is confusing? To me, PreciseUniform and precise::Uniform are the same, except that the former is less flexible.

dhardy commented 6 years ago

Because the normal way to import the name is use rand::distributions::precise::Uniform, which doesn't make it clear in the rest of the code which implementation is used. It also requires two use statements with an explicit as NewName in order to use both in the same module.

vks commented 6 years ago

Because the normal way to import the name is use rand::distributions::precise::Uniform, which doesn't make it clear in the rest of the code which implementation is used.

You are free to prefer use rand::distributions::precise and precise::Uniform instead. No need to rename them. Note that this has precedence in std::io::Error vs. std::fmt::Error and ring's API.

sicking commented 6 years ago

FWIW, I think there's relatively little value in adding additional distributions. I think the real value is in if we can provide faster implementations of the "default" functionality like gen_range, shuffle, choose, etc. The same way we do with gen_bool and gen_ratio, where we use "good enough" precision in order to get better performance.

Which really comes down to having a faster, but "good enough", implementation of Uniform.

Defining "good enough" is hard, but we've been able to do it for gen_bool and gen_ratio, so I don't see a reason we couldn't for Uniform, shuffle and choose. Probably something like, "requires running load X for Y months in order to archive a confidence interval over Z% that Uniform doesn't generate evenly distributed numbers".

Additionally having a "perfectly unbiased" distribution like today's Uniform seems like a good idea. But I think the interesting question is what we do for the APIs above.

dhardy commented 6 years ago

@sicking I am open to revisiting the question of bias when sampling small integer ranges, though I don't know exactly what "confidence" parameters we should have.

One of the issues though is allowing smart choices of bias/performance trade-offs. For example, allowing small bias in integer range sampling has a relatively small performance impact, vs allowing small bias in floating-point range sampling (IIUC). So does it make sense only to have distributions::Uniform with the biased optimisations and distributions::precise::Uniform with the maximal precision?

sicking commented 6 years ago

I'm also not sure what confidence interval to use. My thinking was to use 50% as in "more likely than not that there's bias". I think think the pick of load makes at least as big of a difference. I.e. the choice between "do nothing but generate random numbers on every core on a modern MacBook Pro" vs. "generate one random number every microsecond".

I was able to get about a 2x perf improvement on integer sampling by accepting a small bias for 8-bit, 16-bit integers and 32-bit integers. Smaller than the 6x difference between the current Uniform and the code in #531, but same order of magnitude.

So does it make sense only to have distributions::Uniform with the biased optimisations and distributions::precise::Uniform with the maximal precision?

If we do anything at all, then yeah, I think that's what I would do. Though I don't know if I would put "unbiased integer sampling" in the same distribution as "maximum precision float sampling". I feel like bias and precision are related but not the same. But I'm not sure.

dhardy commented 5 years ago

This problem has gone unanswered for a long time now. Since the above was written, many distributions have been moved to the rand_distr crate, and we have multiple implementations of the WeightedIndex distribution.

Related problems:

Possible solution: add a new module, distributions::impls (or variants; not clear what is the best name). Under this, add a module for each distribution (e.g. distributions::impls::weighted), providing each implementation under a unique, invariant name. The re-export the preferred implementation under distributions.

dhardy commented 3 years ago

By now we have found "nice defaults" (solution 1) for most cases and spun off many implementations to the rand_distr crate. There are still several cases to consider:

My suggestion now is that we stick with "solution 1" (reasonable defaults) for rand and rand_distr, but add a new crate, say rand_precise for high/infinite precision sampling algorithms, and just add new impls there (rand_precise::Bernoulli).

dhardy commented 1 year ago

A lot of this discussion is based on HighPrecision floating-point generation. This is situational, and perhaps unnecessary. Our standard FP generation never produces a whole sub-set of values due to limited precision — which is fine for many uses where that extra precision would never be used anyway (e.g. x+1 would immediately lose it). So for this I agree that it should be a separate distribution (solution 3 or 5) or not at all.

Fast, potentially biased integer sampling is a distinct problem: potentially producing some values too often relative to others. It is behaviourally distinct: for small numbers of samples the bias is statistically non-detectable, while when using sufficiently large numbers of samples the bias could be visible for pretty much any usage. For this I think only solution 1 or 2 is appropriate: if you have a reason to need an unbiased Uniform distribution then exactly the same logic applies to rng.gen_range, slice.shuffle etc.

Having an exact Bernoulli distribution falls under the second case: bias.

The multiple impls of WeightedAlias and index::sample are neither about precision nor bias, therefore nothing to do with this issue.


I propose the following actions:

vks commented 1 year ago

I agree that the high-precision variants could live in their own crate. I'm also not sure whether there are even use cases for them, and a common criticism of rand is too much complexity/flexibility, so it makes sense to split them off. (We should probably decrease our API surface instead of increasing it.[^1])

About unbiased: I think it's great to have a quantitative guideline! I would also prefer an extra crate over solution 2. I think people care more about portability/reproducibility than such small biases. Having an extra crate avoids portability and reproducibility problems, and keeps rand simpler.

[^1]: I would like to revisit rand::distributions::uniform and check whether people actually extend Uniform, or whether we can stop to expose those traits / types.

dhardy commented 1 year ago

I don't see an extra crate as an alternative to solution 2: the biased/unbiased question comes down to "is our simulation likely to have noticeable bias due to the algorithms used" (given the 2^48 threshold used, anything involving less than ~1 year of CPU time is unaffected). For the few cases where the answer is yes, all uses of Uniform, list shuffling, etc. should use unbiased algorithms, hence this is about crate configuration not a new crate.

In my opinion the only relevant question is between solution 1 (one set of algorithms for everything) or 2 (crate configuration), and I don't see a strong rationale either way. (Also relevant: is 5-40% overhead on uniform sampling even an issue, even in things like games? Only if done a lot, as in the ridiculous micro-benchmarks I've been using.)

I would like to revisit rand::distributions::uniform and check whether people actually extend Uniform, or whether we can stop to expose those traits / types. leftwards_arrow_with_hook

The whole rand::distributions::uniform module is essentially an implementation detail people can skip over. If we wanted, we could move it to a new crate, rand_uniform, and import only Uniform into rand, just to hide it. Or just hide it in docs. But this hurts more than it helps.