Compromising accuracy and speed [distributions]

dhardy commented 6 years ago

This is an issue that has been discussed in part several times already in other contexts. But I think it's now worth making an issue specifically about this to get extra feedback.

There are several cases within distribution algorithms that we've had to make decisions about speed-vs-accuracy trade-offs:

Standard now uses 24/53 bits precision (#416); we have discussion around a high-precision variant (#372)
Uniform (for ranges) uses a rejection zone to avoid bias; as noted here we can sometimes use faster approaches if we don't care about small amounts of bias
gen_bool has at least 3 possible implementations with varying accuracy and speed (#411)
gen_ratio is a nice addition to gen_bool with at least two different approachs (#491)

So far, other than with the suggested addition of a HighPrecision distribution (#372) (and the other float sampling methods Open01 and OpenClosed01), we have not given users any choice over this speed-vs-accuracy trade-off but have simply made the choice for users.

Should we try to give users the choice, some how?
Which criteria should we use to choose default implementations?

dhardy commented 6 years ago

Suggestion 1: we just choose defaults with reasonable precision.

Since computing power is inherently limited, sufficiently small biases are unlikely to be noticeable. But how small is good enough? Modern CPUs run up to around 2^32 cycles/second, and a year has around 2^25 seconds, for around 2^57 cycles/year/core. The top supercomputers now have in excess of a million cores, say around 2^24 with around 100 PFlop/s, so 10^14 × 3*10^7 = 3e21 (approx 2^72) floating-point operations per year.

What does this tell us? A bias as small as 1 in 2^128 is unlikely to be detectable, ever (at least without quantum computers). But a bias of 1 in 2^53 (i.e. f64) is potentially detectable even with a desktop PC — in a serious application running non-stop for months or years. On the other hand, many applications wouldn't even notice bias as large as 1 in 2^10. So perhaps a one-size-fits-all solution isn't a good choice, if high-precision comes at a significant performance cost.

Can we say anything else? Above around 1 in 2^20 bias is hard to detect without specifically looking for it, and above 2^40 doing so requires looking quite hard, hence biases this small may be good enough for most purposes, though certainly not all.

(Sorry @pitdicker for paraphrasing a lot of what you said I while back; I forget where.)

dhardy commented 6 years ago

Suggestion 2: use feature flags to control precision

If around 2^20-2^40 bits of precision is enough for most users, then what if we accept such choices as defaults, but add a high-precision feature flag to enable higher-precision algorithms?

There is a potential catch here — activation of higher-precision becomes application-wide, and if any lib in the application requires Rand's high-precision mode, then it gets used everywhere. If we implement this option we should therefore recommend that only applications enable this flag (if required); unfortunately this doesn't play well with Cargo crates bundling both libraries and applications.

Further, this is only going to be a good choice if the high-precision alternatives are still reasonably fast, since users can't pick which algorithm to use at each use-site.

This would probably not affect compilation time or code size much, since the algorithms would be behind feature flags thus unused code can be eliminated early.

Summary: keeps the API clean and simple; Rand still gets a little more complex (multiple implementations); limited control

dhardy commented 6 years ago

Suggestion 3: multiple implementations, as necessary

This is basically the current plan regarding Standard and HighPrecision (#372) — just have multiple implementations which can be selected at the use-site, e.g. rng.gen() vs rng.sample(HighPrecision::new(0.0, 1.0))

For e.g. gen_bool this would require multiple function names; alternatively it might be better to stick with only a single implementation (the current one rounds probabilities down to the next multiple of 2^-64, which is good enough for most users, though perhaps not everyone).

Summary: flexible but with extra API complexity and sometimes worse ergonomics.

dhardy commented 6 years ago

Suggestion 4: generic, over some configuration type

What if we added dummy types High and Low both implementing a Precision trait, then templated over these? I.e. Uniform::<Low, f64>::new(0.0, 5.0), Bernoulli::<High>::new(1e-30), ...

This would probably be quite bad for ergonomics (if it even worked as designed) since presumably users could not omit generic parameters any more (let range = Uniform::new_inclusive(1, 6); would become let range = Uniform::<Low, _>::new_inclusive(1, 6);).

In my opinion the poor ergonomics and extra API complexity are unacceptable; as it is now Rand is already confusing some users.

sicking commented 6 years ago

There's a variant of Suggestion 3 which is that rather than make it a property of which API is used (gen_bool() vs gen_fast_bool()), that we make it a property of the Rng object.

I.e. that something like fast_rng().gen_range(...) would use slightly biased, but faster, algorithms, whereas secure_rng().gen_range(...) would not accept any bias.

Similarly vec.shuffle(fast_rng()) would used slightly biased generation, whereas vec.shuffle(secure_rng()) would not.

We might even be able to create wrappers which wrap Rng objects and which turns on/off bias. Such that fast_rng().unbiased().gen_range(...) would produce unbiased numbers.

I don't know if there's any good way to do this which doesn't introduce runtime cost though. It'd work to stick a fn allow_bias() -> bool function on RngCore and then check that where needed. But I don't know if the optimizer is good enough at optimizing away the check at all times?

Also this approach would make fast_rng().sample(Uniform::new(1, 10)) kind of wonky. The Uniform constructor will calculate the anti-bias zone, but would sample produce biased on unbiased values?

sicking commented 6 years ago

Ultimately I think Suggestion 1 is the most reasonable. Though it scares me a little to think that someone might use a CSPRNG together with a biased algorithm...

pitdicker commented 6 years ago

That is a very good summary, @dhardy!

We seem to be in a bit of a difficult position, as Rand is used for some very different use cases:

randomness in common programming problems
reasonably secure randomness
long-running simulations

In my opinion, we should strongly optimize for the 'common programming problems' (security does not really apply here). Is seems reasonable to expect anyone who writes multiple-month simulations to spend an hour reading docs, or know the limitations of the algorithms he is using.

I like suggestion 1 most, but would also like to see 3 when reasonable.

Specifically:

gen_bool: return to the 32-bit accuracy, and have the Bernoulli distribution for 53~64 bit accuracy. One optimized for common use, one for long-running simulations.
Have both Standard and HighPrecision01.
Uniform range: have to think a bit more about this one. An imperfect method is a good choice when the numbers are relatively small. Maybe we can make it depend on the type, with the faster, but imperfect, method for u8 and u16?

Letting the accuracy depend on the RNG as @sicking suggest is also an interesting idea. But the distinction should not be between normal PRNGs and CSPRNGs. Accuracy is not really a security-related problem. PRNGs that are suitable for simulations are also the ones that are suitable for many common problems. I would say the suitability depends more on where the function is used, than the RNG.

dhardy commented 6 years ago

I'm not sure how @sicking's suggestion could be implemented other than by adding a method like fn use_high_precision_sampling(&self) -> bool to RngCore (specialisation might allow a different approach based on whether an additional trait, HighPrecisionSamplingRng, was implemented, but not yet). Ultimately I'm not so keen on the idea anyway because as @pitdicker says, accuracy is more a use-case problem and less a security problem (though arguably enabling it globally within an application is a reasonable solution, hence suggestion 2).

We could also modify suggestion 2 by adding a low_precision feature in addition to (or instead of) the high_precision one.

Other than that, I just wanted to say that talking about Bernoulli as having 53/64-bit accuracy is a bit mis-leading: it currently supports down to 2^-64, but 53-bit (and even 24-bit) precise floats can represent far smaller positive numbers (MIN_POSITIVE is approx 10^-38 and 10^-308).

pitdicker commented 6 years ago

Just a quick way to say the accuracy lies between 53 and 64 bits :-). Is a chance smaller than, say 2^50, really a chance we see as a reasonable use?

B.t.w., I have been a bit absent the past week, but try to get back to things this evening

Op wo 6 jun. 2018 10:04 schreef Diggory Hardy notifications@github.com:

I'm not sure how @sicking https://github.com/sicking's suggestion could be implemented other than by adding a method like fn use_high_precision_sampling(&self) -> bool to RngCore (specialisation might allow a different approach based on whether an additional trait, HighPrecisionSamplingRng, was implemented, but not yet). Ultimately I'm not so keen on the idea anyway because as @pitdicker https://github.com/pitdicker says, accuracy is more a use-case problem and less a security problem (though arguably enabling it globally within an application is a reasonable solution, hence suggestion 2).

We could also modify suggestion 2 by adding a low_precision feature in addition to (or instead of) the high_precision one.

Other than that, I just wanted to say that talking about Bernoulli as having 53/64-bit accuracy is a bit mis-leading: it currently supports down to 2^-64, but 53-bit (and even 24-bit) precise floats can represent far smaller positive numbers (MIN_POSITIVE is approx 10^-38 and 10^-308).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang-nursery/rand/issues/494#issuecomment-394979772, or mute the thread https://github.com/notifications/unsubscribe-auth/AF9xynYji0sVNqAf9i0tvl_lI9YaXAOvks5t540qgaJpZM4Uabdu .

dhardy commented 6 years ago

Another thing I want to say is that I'd prefer to err on the side of caution (i.e. more precision than necessary) than on the side of performance (for default choices). That said, I think some trade-offs can still be made (e.g. the current Bernoulli should be good enough for the vast majority of users).

This is really more political than it is technical: people are far more likely to criticise incorrect and prematurely-optimised approaches than slow ones; also since in many cases performance will not be a real issue anyway, I don't think it makes sense to "over-optimise".

Because of this, I'm not happy to skip the zone checks for range sampling (e.g. sampling u8 from u32 with Uniform), even though quick tests show 30-50% better performance. If we were to sample e.g. i16 from a u64 backing range then arguably any bias would be too small to matter, but it appears faster to sample an i16 from a u32 backing range with the correct zone checks anyway, and in this case although bias is still small, it's not small enough that I'm happy to ignore it by default.

I therefore think that these range sampling optimisations (and perhaps also the 32-bit gen_bool solution) should either not be used at all, or only be enabled on an opt-in basis. There are probably some uses where the bias really doesn't matter, but in the interests of making users lives simpler I think the only good opt-in solution would be a feature flag (i.e. suggestion 2 inverted).

Good point @pitdicker that anyone writing serious simulations will likely look over the docs closely, so slightly less obvious/ergonomic solutions for people needing high precision is fine (i.e. solution 3), though suggestion 2 may still be easier.

I guess the wrap-up question is would any of these solutions make our lives easier (i.e. more obvious answers to speed-vs-quality compromises) vs harder (more code/complexity)?

No problem @pitdicker :wink:

vks commented 6 years ago

gen_bool: return to the 32-bit accuracy, and have the Bernoulli distribution for 53~64 bit accuracy. One optimized for common use, one for long-running simulations.

I'm skeptical for the following reasons:

If you really care about performance, gen_bool is the wrong interface. Bernoulli::new and sample are potentially faster. The only reason to use 32-bit accuracy over 64-bit accuracy is performance, so a 32-bit gen_bool would be a weird trade-off between convenience and performance.
It makes the API more complicated. Right now, gen_bool is a convenient wrapper over Bernoulli, your suggestion would make it a different algorithm with different trade-offs to consider.

I would like to propose a variation of suggestion 3:

Suggestion 5: split up the distributions into two modules: `fast` and `precise`

Distributions optimized for speed go to distributions::fast, if they are optimized for accuracy they go to distribtuions::precise.

Only the distributions in precise could be reexported to provide a conservative default. Users can pick explicitly which one they want to use, or just pick the default. This would also get rid of weird names like HighPrecision01, which could be just precise::ClosedOpen01. We could also provide a 32-bit based fast::Bernoulli. (I don't think there should be agen_bool using fast::Bernoulli: convenient and less safe don't mix well.)

Summary: flexible but with extra API complexity that can be safely ignored by probably most users

dhardy commented 6 years ago

Nice suggestion @vks! Two niggles:

the names are unwieldy
for maximum compatibility, it would make sense if the two modules are (roughly) equivalent, so users can just use rand::distributions::CHOICE as distributions;

With that in mind, perhaps rand::distributions should contain the default implementations, and rand::distributions::fast (?) contain the fast versions, re-binding all items from the main module which do not have faster equivalents.

One thing though: should we really make HighPrecision01 the default (and also a high-precision variant of Uniform, which we haven't implemented yet and may have large overhead)?

I guess we could have three variants of the module: the default, fast, and precise. Seems somewhat over-designed.

vks commented 6 years ago

the names are unwieldy

Not sure how they could be any shorter. prec instead of precise? To be honest, I would rather shorten distributions.

I think the alternative suggestions except suggestion 1 are more unwieldy.

for maximum compatibility, it would make sense if the two modules are (roughly) equivalent

Yes, it should be safe for fast to reexport precise if there is no faster implementation.

I guess we could have three variants of the module: the default, fast, and precise. Seems somewhat over-designed.

That's what I suggested. You would have to choose between distributions::Dist, distributions::fast::Dist and distributions::precise::Dist. We could decide on a case-by-case basis whether distributions::Dist is reexporting from fast or precise, but this would increase the mental burden for the user.

I don't think it is over-designed (unless providing more than one implementation is already over-designed), because it is backwards compatible and nicely allows to opt into the desired implementation.

dhardy commented 6 years ago

I mean the combined names, e.g. distributions::fast. But no, I don't have a better idea. It'll do.

We could decide on a case-by-case basis whether distributions::Dist is reexporting from fast or precise

I see. Okay, sounds like a good plan, though it leaves two things out:

Short-cuts like Rng::gen, gen_range and gen_bool will not be configurable; users wanting something else must switch to distributions::XYZ::Uniform etc.; I guess this is acceptable though not ideal.
Do we want all of these things in Rand, vs some external library (#290)? And if we do, do we want all implementations in Rand itself? Moving some distributions out to a different library shouldn't be difficult (should we decide to do that), but e.g. moving high-precision variants to an external lib wouldn't be so easy — but probably there is no reason to do that anyway.

sicking commented 6 years ago

But the distinction should not be between normal PRNGs and CSPRNGs. Accuracy is not really a security-related problem.

Yeah, I didn't mean to suggest otherwise. Mainly I was thinking that someone specifically requesting a cryptographically strong likely wouldn't want to use an algorithm with bias as a default. I.e. if you're generating a cryptographically strong security token for authentication, I was thinking you likely don't want more as showing up than zs. Though from a practical point of view, if we make the biases small enough, security doesn't seem affected in a meaningful way. I.e. if a 256-bit key ends up being just 255.99 bits, then that's likely fine.

But I'd want to check with someone that knows crypto better than me before being sure about that.

Suggestion 5: split up the distributions into two modules

I think the distributions are the relatively easy part to solve. Separate structs for fast vs. biased I think would make for both make for a clear and easy to implement API. If we do it by sticking the distributions in separate modules, or by using more expressive names, is not something I have an opinion about.

The bigger problem is all the other APIs. I.e. things like gen_range(), shuffle(), choose(), choose_multiple(), sample_indices(), etc. These are the ones that I was thinking we could do by using flags on the RngCore. But it's not clear to me that that's implementable. Or worth the implementation complexity.

sicking commented 6 years ago

I agree with @pitdicker that we should optimize for "common programming problems". Which I feel like it means that sufficiently small biases is ok.

Though I'm not sure what optimizing for "common programming problems" means for performance. Arguably it means that if making things faster comes at a cost of code complexity (which often is the case), we should only optimize the code if it's likely to show up in real-world profiles, rather than worry too much about squeezing everything out of our micro-benchmarks.

But I'm happy to defer to others. Especially in cases where implementation strategy doesn't affect the public API and so is something that we can easily change over time.

vks commented 6 years ago

Short-cuts like Rng::gen, gen_range and gen_bool will not be configurable; users wanting something else must switch to distributions::XYZ::Uniform etc.; I guess this is acceptable though not ideal.

They trade flexibility for convenience. Making them more flexible defeats their purpose, so I'm not sure what the alternative would be? I'm not a big fan of any of those methods TBH.

This is different for the algorithms operating on sequences (shuffle, choose etc.), which have no alternative. We could possible add shuffle_with_dist(&mut [T], &mut Rng, Fn(usize, usize) -> impl Distribution<usize>) etc., but maybe that is too flexible.

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question. Either of the alternatives @pitdicker mentioned may be "common" in certain domains.

Maybe it would be more actionable to look at typical implementations used in the popular randomness libraries that have been used a lot without anyone complaining about biases.

Please note that "long-running simulations" are not necessarily "long" in terms of real time but rather in CPU time. On most supercomputers there is an upper limit on the runtime, ranging from a day up to a week or two.

moving high-precision variants to an external lib wouldn't be so easy

Why would it be more difficult than for the others? In any case, I think they should live in the same crate. It might make sense to move the distributions to their own crate. However, at the moment they are tied to some methods of Rng. I think those could be all moved to the distributions or a module for the sequences though.

dhardy commented 6 years ago

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question.

Agreed; in part also because trying to tie down an exact level of bias which is acceptable in general is difficult.

Two things about @vks suggestion still need finalising:

How do we document this? Obviously we can document the trade-offs each individual implementation makes, but e.g. given fast::Uniform and precise::Uniform, should distributions::Uniform document both implementations and link to them or copy the documentation from whichever it uses or even not talk about precision other than to briefly mention which variant is used?
How do we handle breaking changes where we decide to replace distributions::X with a different implementation of X? Do we tell users that they should explicitly use either fast::X or precise::X if they care about reproducibility or at all about which method gets used? (Related: #408)

pitdicker commented 6 years ago

I'm not sure what "common programming problems" means, because it depends so strongly on the programs in question. Either of the alternatives @pitdicker mentioned may be "common" in certain domains.

I don't really see it as difficult or vague. When we start to talk in terms of "this will take 1+ year of CPU-time before the tiniest bias becomes visible while doing nothing but calling this function", we are clearly outside the "common programming problems" territory.

Maybe it would be more actionable to look at typical implementations used in the popular randomness libraries that have been used a lot without anyone complaining about biases.

That is always a good idea :smile:.

I really don't think suggestion 2 is a good idea. The choice of precision really depends on how something is used. Say a bit more accuracy is wanted in one place. A feature flag would effect every other function, also in every other dependency.

I don't know what to think of suggestion 5. If we only consider Uniform, maybe Standard for floats, and maybe Bernoulli, it makes some sense. But it doesn't really for all the other distributions.

moving high-precision variants to an external lib wouldn't be so easy

Why would it be more difficult than for the others?

Seems like a good alternative to keep in mind. We already had the same discussion around PRNG algorithms. In my opinion is is a good thing for Rand to be opinionated in the functionality it includes. If we can't make good choices, or offer multiple variants if really needed, how do we expect users to choose? Distribution is a trait, there is no harm in having alternative algorithms for some distributions in another crate.

sicking commented 6 years ago

Another approach occurred to me:

Suggestion 6: Use the type of the generated data to determine precision

I noticed that rand used to have a struct Closed01(f32) (or some such) type for generating values in the [0, 1] range.

We could use a similar approach for choosing between high-performance vs. low-bias algorithms. I.e. Uniform<Fast<u32>> would generate slightly biased but higher performance numbers. And Uniform<Precise<u32>> would generate unbiased numbers, but using a slower algorithm. Where Fast looks something like struct Fast<T: ...>(T).

This would unfortunately only work for APIs like Uniform, gen, gen_range, gen_ratio, and gen_bool, which are (or could be) generic over the type of the generated numbers. It does not work for SliceRandom.choose which has no such type parameter.

I'm not sure that I particularly like this solution, but I wanted to add it for completeness.

dhardy commented 6 years ago

In some cases this "wrapping type" notation can be used easily, but to be honest I found Open01 etc. quite annoying to use a lot of the time. E.g. you might want to write this:

let Precise(x) = rng.gen();
// but the type of `x` must be specified somewhere, so you may require:
let Precise(x): Precise<f64> = rng.gen();

dhardy commented 6 years ago

I now agree with @pitdicker's view the most: that in most cases we can just pick a sensible default, and in a few we can add extra implementations (i.e. Suggestion 3).

Proposal:

keep most things as-is
where necessary, add extra implementations whose type names are prefixed by Precise or Fast (i.e. rename HighPrecision01 to PreciseUniform01 and HighPrecision to PreciseUniform)

This seems the simplest solution, and also provides uniform names, without the confusion of shadowed names in sub-modules (I don't like the idea of having distributions::Uniform and distributions::precise::Uniform as in suggestion 5).

Is everyone happy enough with this proposal?

vks commented 6 years ago

I don't like the idea of having distributions::Uniform and distributions::precise::Uniform as in suggestion 5

Why do you think it is confusing? To me, PreciseUniform and precise::Uniform are the same, except that the former is less flexible.

dhardy commented 6 years ago

Because the normal way to import the name is use rand::distributions::precise::Uniform, which doesn't make it clear in the rest of the code which implementation is used. It also requires two use statements with an explicit as NewName in order to use both in the same module.

vks commented 6 years ago

Because the normal way to import the name is use rand::distributions::precise::Uniform, which doesn't make it clear in the rest of the code which implementation is used.

You are free to prefer use rand::distributions::precise and precise::Uniform instead. No need to rename them. Note that this has precedence in std::io::Error vs. std::fmt::Error and ring's API.

sicking commented 6 years ago

FWIW, I think there's relatively little value in adding additional distributions. I think the real value is in if we can provide faster implementations of the "default" functionality like gen_range, shuffle, choose, etc. The same way we do with gen_bool and gen_ratio, where we use "good enough" precision in order to get better performance.

Which really comes down to having a faster, but "good enough", implementation of Uniform.

Defining "good enough" is hard, but we've been able to do it for gen_bool and gen_ratio, so I don't see a reason we couldn't for Uniform, shuffle and choose. Probably something like, "requires running load X for Y months in order to archive a confidence interval over Z% that Uniform doesn't generate evenly distributed numbers".

Additionally having a "perfectly unbiased" distribution like today's Uniform seems like a good idea. But I think the interesting question is what we do for the APIs above.

dhardy commented 6 years ago

@sicking I am open to revisiting the question of bias when sampling small integer ranges, though I don't know exactly what "confidence" parameters we should have.

One of the issues though is allowing smart choices of bias/performance trade-offs. For example, allowing small bias in integer range sampling has a relatively small performance impact, vs allowing small bias in floating-point range sampling (IIUC). So does it make sense only to have distributions::Uniform with the biased optimisations and distributions::precise::Uniform with the maximal precision?

sicking commented 6 years ago

I'm also not sure what confidence interval to use. My thinking was to use 50% as in "more likely than not that there's bias". I think think the pick of load makes at least as big of a difference. I.e. the choice between "do nothing but generate random numbers on every core on a modern MacBook Pro" vs. "generate one random number every microsecond".

I was able to get about a 2x perf improvement on integer sampling by accepting a small bias for 8-bit, 16-bit integers and 32-bit integers. Smaller than the 6x difference between the current Uniform and the code in #531, but same order of magnitude.

So does it make sense only to have distributions::Uniform with the biased optimisations and distributions::precise::Uniform with the maximal precision?

If we do anything at all, then yeah, I think that's what I would do. Though I don't know if I would put "unbiased integer sampling" in the same distribution as "maximum precision float sampling". I feel like bias and precision are related but not the same. But I'm not sure.

dhardy commented 5 years ago

This problem has gone unanswered for a long time now. Since the above was written, many distributions have been moved to the rand_distr crate, and we have multiple implementations of the WeightedIndex distribution.

rust-random / rand