Open Enter-tainer opened 1 year ago
Shame the doc link pictured above didn't work, because the docs do explain.
The "standard normal" is just a common (default) parameterisation of the Normal distribution so it shouldn't be confusing. But search engines being what they are, I can see why "standard distribution" finds the "standard normal distribution".
I guess it could be renamed to Default
... while we're at it, we could shorten distributions
to distr
... these are quite significant breaking changes so I'm not sure how acceptable they would be with users.
Another possibility is to replace it with more specific distributions.
Maybe a UniformAll
distribution for these:
Uniform::new_inclusive($ty::MIN, $ty::MAX)
.Wrapping<u32>
etc.: samebool
: we have rng.gen_bool()
but could keep this (same distr as above)FP (f32
, f64
): just let people write Uniform::new(0.0, 1.0)
. Also makes it clear that x * 2.0 - 1.0
is not the best way to sample from -1.0..1.0
.
char
: we already have Alphanumeric
; we could add a new distribution like Codepoint
(any valid Unicode code-point).
Option<T>
: I don't see real use-cases (beyond testing) for this; maybe we should just make sure it's easy to write an adapter that does it.
Tuples, arrays: again, these could be generated via an adapter.
This is where things get problematic...
We could view a closure as an adapter of sorts (play):
let unif11 = Uniform::new(-1.0, 1.0).unwrap();
let x = rng.sample_closure(move |rng| if rng.gen_bool(0.5) {
rng.sample(&unif11)
} else {
f64::NAN
});
Nice, but:
let
binding, you hit this error: type annotations needed
. Why? Because sample_closure
requires F: Fn(&mut Self)
. Fn
cannot be generic over R: Rng
.Distribution
implementing object from a closure.We could cheat and use dyn Rng
instead, or we could do something horrible complicated with function (or rather trait-impl) currying, but it's probably easier to have users write their own Distribution
impls.
@vks any thoughts on this? I'm tempted not to make any changes: it's significant breakage and most users appear to get on fine with the current design.
I'm also not very happy with the unspecific name Standard
, and I agree that distributions
is too long. StandardUniform
would be better (i.e. the unparametrized uniform distribution, similar to StandardNormal
and StandardGeometric
). However, this would be more restrictive than the current definition:
A generic random value distribution, implemented for many primitive types. Usually generates values with a numerically uniform distribution, and with a range appropriate to the type.
I think it's always a uniform distribution in practice (at least in Rand, and probably 3rd party impls as well).
I'm also not sure it's worth the churn, but maybe lets look at how much breakage this would cause:
Standard
is mostly used by "convenience" wrappers like random()
and Rng::gen()
. I'm also not the biggest fan of them (I prefer sample()
), but they are popular, and I think there would be a lot of protest against removing them (even if it's only for floats). I understand the point of discouraging random() * max
, but this is a pattern that was spread by libraries for decades. Adding docs (or maybe even a lint?) seems like a better investment.Distribution<T> for Standard
to extend random()
/ Rng::gen()
? That would include people implementing this for enums, and bignum libraries, which does not seem so bad. Are there any other common use cases?distribution
to distr
would probably cause a lot more breakage.I think it's always a uniform distribution in practice
Is an Option
having 50% chance of being None
and 50% chance of being some Standard
-sampled sub-value uniform? Debatable maybe but probably not. Maybe we shouldn't even support Option
, but... it's probably useful in test-cases...
I'm also not the biggest fan of them
Same, and I might only consider them useful for something like fuzz-testing, where a uniform distribution may not even be the best choice. Especially for floats. But "garbage in, garbage out" kind of applies here: without contracts on input/output validity (e.g. ranges), fuzz-testing can't really tell you anything useful anyway.
It appears that quickcheck
does still use Standard
though a more specific distribution would have an advantage (but also the disadvantage of less third-party support).
The real argument for keeping Standard
remains rand::random
and Rng::gen
. Is there a more appropriate name here? Maybe the Basic
or Default
distribution? Yes, but neither is significantly better IMO.
Renaming
distribution
todistr
would probably cause a lot more breakage.
And you even forgot that the module name is plural, like std::collections
. Yes, we could rename or alias (but not deprecate an alias). Renaming might be worthwhile.
Closing: no change.
There remains the option of renaming to StandardUniform
, but that brings up more questions like:
Option
?I'm ok with this because I already learn the lesson. But I still think standard
is not a good name. It it more like default
than standard
Default
would conflict with the std
trait and likely be confusing.
my concern is that this distribution is actually not "standard". i dont think there is a standard for random distributions.
i agree with your concern. it can be hard to find a suitable name for this.
How about DiscreteUniform
? I agree that Standard
is not the best name, but unless we find a better alternative, I am personally fine with keeping it.
I think DiscreteUniform
is misleading, because it's not uniform for Option<T>
, and for floats it's not really discrete either.
To justify the churn of renaming Standard
, we would need a significantly better name.
I could live with StandardUniform
, and possibly removing the Option<T>
impl, but I'm not sure this is significantly better.
I'd like to add that "Standard" is misleading not just because of search engines. "Standard deviation" is a common term, and it's often considered in relation to the normal distribution.
All in favour of one of the following, vote with an emoji:
StandardUniform
- :rocket: Standard
- :-1: Default
- :confused: @purplesyringa "Standard deviation" is not unique to the normal distribution. In fact, most (all?) distributions in Rand have one.
Cauchy does not have one ;)
Summary
To address this issue, either api-breaking changes or docs update may happen.
Details & Motivation
I would like to provide some feedback regarding the
distributions::Standard
. I believe that the name of this feature is misleading and can be confusing for users.I came across this feature when I use
rng.gen()
, andStandard
is the default distribution. I'm not sure what it is, so I search it online...When searching for the term "Standard Distribution" online, users are very likely to come across information about the standard normal distribution, which can lead to misunderstandings. Furthermore, this name is not commonly used in other contexts, making it difficult for users to understand what it refers to.
I would like to suggest that consider changing the name of this feature or provide clear warnings to users that it should not be confused with the standard normal distribution. This would greatly improve the usability and clarity of this library.