CHANGE: The name `distributions::Standard` is misleading

Enter-tainer commented 1 year ago

Summary

To address this issue, either api-breaking changes or docs update may happen.

Details & Motivation

I would like to provide some feedback regarding the distributions::Standard. I believe that the name of this feature is misleading and can be confusing for users.

I came across this feature when I use rng.gen(), and Standard is the default distribution. I'm not sure what it is, so I search it online...

When searching for the term "Standard Distribution" online, users are very likely to come across information about the standard normal distribution, which can lead to misunderstandings. Furthermore, this name is not commonly used in other contexts, making it difficult for users to understand what it refers to.

What changes does this require internally?

I would like to suggest that consider changing the name of this feature or provide clear warnings to users that it should not be confused with the standard normal distribution. This would greatly improve the usability and clarity of this library.

dhardy commented 1 year ago

Shame the doc link pictured above didn't work, because the docs do explain.

The "standard normal" is just a common (default) parameterisation of the Normal distribution so it shouldn't be confusing. But search engines being what they are, I can see why "standard distribution" finds the "standard normal distribution".

I guess it could be renamed to Default... while we're at it, we could shorten distributions to distr... these are quite significant breaking changes so I'm not sure how acceptable they would be with users.

dhardy commented 1 year ago

Another possibility is to replace it with more specific distributions.

Maybe a UniformAll distribution for these:

Integers should be the same (including performance) as Uniform::new_inclusive($ty::MIN, $ty::MAX).
Wrapping<u32> etc.: same
bool: we have rng.gen_bool() but could keep this (same distr as above)

FP (f32, f64): just let people write Uniform::new(0.0, 1.0). Also makes it clear that x * 2.0 - 1.0 is not the best way to sample from -1.0..1.0.

char: we already have Alphanumeric; we could add a new distribution like Codepoint (any valid Unicode code-point).

Option<T>: I don't see real use-cases (beyond testing) for this; maybe we should just make sure it's easy to write an adapter that does it.

Tuples, arrays: again, these could be generated via an adapter.

Adapters

This is where things get problematic...

We could view a closure as an adapter of sorts (play):

let unif11 = Uniform::new(-1.0, 1.0).unwrap();
let x = rng.sample_closure(move |rng| if rng.gen_bool(0.5) {
    rng.sample(&unif11)
} else {
    f64::NAN
});

Nice, but:

If the closure is named with a let binding, you hit this error: type annotations needed. Why? Because sample_closure requires F: Fn(&mut Self). Fn cannot be generic over R: Rng.
For the same reason, we cannot automatically construct a Distribution implementing object from a closure.

We could cheat and use dyn Rng instead, or we could do something horrible complicated with function (or rather trait-impl) currying, but it's probably easier to have users write their own Distribution impls.

dhardy commented 1 year ago

@vks any thoughts on this? I'm tempted not to make any changes: it's significant breakage and most users appear to get on fine with the current design.

vks commented 12 months ago

I'm also not very happy with the unspecific name Standard, and I agree that distributions is too long. StandardUniform would be better (i.e. the unparametrized uniform distribution, similar to StandardNormal and StandardGeometric). However, this would be more restrictive than the current definition:

A generic random value distribution, implemented for many primitive types. Usually generates values with a numerically uniform distribution, and with a range appropriate to the type.

I think it's always a uniform distribution in practice (at least in Rand, and probably 3rd party impls as well).

I'm also not sure it's worth the churn, but maybe lets look at how much breakage this would cause:

Standard is mostly used by "convenience" wrappers like random() and Rng::gen(). I'm also not the biggest fan of them (I prefer sample()), but they are popular, and I think there would be a lot of protest against removing them (even if it's only for floats). I understand the point of discouraging random() * max, but this is a pattern that was spread by libraries for decades. Adding docs (or maybe even a lint?) seems like a better investment.
If we rename it, wouldn't it only affect people implementing Distribution<T> for Standard to extend random() / Rng::gen()? That would include people implementing this for enums, and bignum libraries, which does not seem so bad. Are there any other common use cases?
Renaming distribution to distr would probably cause a lot more breakage.

dhardy commented 12 months ago

I think it's always a uniform distribution in practice

Is an Option having 50% chance of being None and 50% chance of being some Standard-sampled sub-value uniform? Debatable maybe but probably not. Maybe we shouldn't even support Option, but... it's probably useful in test-cases...

I'm also not the biggest fan of them

Same, and I might only consider them useful for something like fuzz-testing, where a uniform distribution may not even be the best choice. Especially for floats. But "garbage in, garbage out" kind of applies here: without contracts on input/output validity (e.g. ranges), fuzz-testing can't really tell you anything useful anyway.

It appears that quickcheck does still use Standard though a more specific distribution would have an advantage (but also the disadvantage of less third-party support).

The real argument for keeping Standard remains rand::random and Rng::gen. Is there a more appropriate name here? Maybe the Basic or Default distribution? Yes, but neither is significantly better IMO.

Renaming distribution to distr would probably cause a lot more breakage.

And you even forgot that the module name is plural, like std::collections. Yes, we could rename or alias (but not deprecate an alias). Renaming might be worthwhile.

dhardy commented 5 months ago

Closing: no change.

There remains the option of renaming to StandardUniform, but that brings up more questions like:

Should we remove support for Option?
Should we add another distribution appropriate for fuzzing? ... so the easiest acceptable option is no change.

Enter-tainer commented 5 months ago

I'm ok with this because I already learn the lesson. But I still think standard is not a good name. It it more like default than standard

dhardy commented 5 months ago

Default would conflict with the std trait and likely be confusing.

Enter-tainer commented 5 months ago

my concern is that this distribution is actually not "standard". i dont think there is a standard for random distributions.

i agree with your concern. it can be hard to find a suitable name for this.

newpavlov commented 5 months ago

How about DiscreteUniform? I agree that Standard is not the best name, but unless we find a better alternative, I am personally fine with keeping it.

vks commented 5 months ago

I think DiscreteUniform is misleading, because it's not uniform for Option<T>, and for floats it's not really discrete either.

To justify the churn of renaming Standard, we would need a significantly better name.

I could live with StandardUniform, and possibly removing the Option<T> impl, but I'm not sure this is significantly better.

purplesyringa commented 1 week ago

I'd like to add that "Standard" is misleading not just because of search engines. "Standard deviation" is a common term, and it's often considered in relation to the normal distribution.

dhardy commented 1 week ago

All in favour of one of the following, vote with an emoji:

Switch to StandardUniform - :rocket:
Keep Standard - :-1:
Switch to Default - :confused:
Another suggestion: post below; people can vote :+1:

vks commented 6 days ago

@purplesyringa "Standard deviation" is not unique to the normal distribution. In fact, most (all?) distributions in Rand have one.

benjamin-lieser commented 6 days ago

Cauchy does not have one ;)

rust-random / rand