Open lehins opened 4 years ago
It isn't uniform. Let take for sake of simpliciti Ratio Word3
. In this case we will generate with equal probability each value in the list:
liftA2 (%) [0..7] [1..7]
. They cover range [0, 7] but only 5 values are in [3.5 .. 7] range.
@Shimuuar It is uniform as far as the Ratio Word3
is concerned. It is just as uniform as selecting uniformly from data RGB = Red | Green | Blue
data type with equal probability. We generate all possible values that a data type can have with uniform probability.
I understand that with respect to the actual rational numbers over the infinite space, this is not uniform, but the type is not infinite. That means we need to decide. Is Uniform
about math and real values, or it is about the uniformity of types? I am fine with either choice, by the way, because we do still have Random
class that allows us to deal with this issue.
When it come to numbers uniform usually understood as distributed on number line with uniform density. For Ints & Words we work on uniform grid so notions of uniform density and generate every inhabitant with equal probability coincide. For floats we try to approximate density and abandon every inhabitant with equal probability.
But this instance is not uniform in either sense! It generates 1
7 times out of 56, and 7
only 1 of 56.
The problem is that Ratio Word3
has not 8*(8-1)=56, but only 36 inhabitants: 0 and 1 appear 7 times of 56, 1/2 and 2 - thrice; 1/3, 2/3, 3/2 and 3 - twice. It is certainly non-Uniform
inhabitants-wise.
I see what you mean. So my suggested implementation is bogus. That's fine. Question is can we come up with a sensible implementation(s) for the Ratio
type?
For Ratio Word
one can generate n
and d
; if gcd n d == 1
then return n % d
, otherwise reject them and generate new ones. But things may become more complicated for Ratio Int
because of signs and minBound /= negate maxBound
, need to think more about it.
In my experience Ratio Int
/ Ratio Word
are of limited utility, because it is too easy to get an overflow in denominator.
Also I think that approach "generate every inhabitant with equal probability" is just a giant footgun. I don't think anyone would expect such semantics.
I don't think anyone would expect such semantics.
2 % 4
and 1 % 2
are equal inhabitants as far as Ratio
is concerned, because values are normalized. So here we are really talking about generating all valid inhabitants with equal probability.
@Shimuuar What would you expect if someone asked you: Generate a random rational number for me please?
@Bodigrim I remember helping out a friend with this and he came up with this validation for Ratio
: https://github.com/NorfairKing/validity/blob/cfe18e509a7c3298a15751c5889254cf0650cbc2/validity/src/Data/Validity.hs#L500-L508
CC @NorfairKing
@lehins For the record, this is how genvalidity
generates Ratios:
https://github.com/NorfairKing/validity/blob/cfe18e509a7c3298a15751c5889254cf0650cbc2/genvalidity/src/Data/GenValidity.hs#L718-L728
genValid = (do
n <- genValid
d <- (genValid `suchThat` (> 0))
pure $ n :% d) `suchThat` isValid
You can probably generate the denominator more easily if you assume Num
and use abs
.
@Shimuuar What would you expect if someone asked you: Generate a random rational number for me please?
Well I certainly don't expect such questions. But it's impossible to speak about generating random numbers without specifying distribution first. So we either know what we're talking about or person asking such question deserve some ridicule
But then if we have Uniform why nor UniformRange? How UniformRange should work then? It it uses same approach: "generate every inhabitant with equal probability" why then it works differently from Float/Double. They're just a different subset of rationals after all.
FWIW if I'd ask a library to generate random Ratio
in the place where I don't care much about distribution, I would expect that it has a uniform distribution over requested range, and I would consider non-uniform distribution as a bug.
Well I certainly don't expect such questions.
Lol. I am asking you one right now ;) Basically how would you solve it? We need uniform distribution for Ratio
data type
"Don't try to be clever just use doubles". It's question about building unform distributions between two really big numbers (0, maxBound :: Word64) for example. It's very hard problem since it requiring carefully weighting numbers on very uneven grid. I'm not sure it's even possible without excessive lookup tables
"Don't try to be clever just use doubles"
lol. I am already submitting a PR to ghc to remove Ratio
from base :D
It is a hard problem, but I don't think it is an unsolvable one. We definitely don't have to decide on it now, so let's think about it and see what we can come up with.
lol. I am already submitting a PR to ghc to remove Ratio from base :D
I'm quite serious BTW. Generating doubles, casting them to Rationals and working with them in case you need to work with them is quite viable approach. Another one is to generate uniformM % maxBound
but then one have to convert to Ratio Integer
since basically any operation could overflow denominator.
Ratio a
, when a
has finite precision, is a very dangerous and unlawful beast. For example, (-11 % 12 :: Ratio Int8) > 13 % 12
. I would rather discourage people from using such types, and I bear no interest in providing utilities to work with them.
On the other hand, if we decide to give a green light to new instances of Random
, we can define instance Random Rational
just applying toRational
to Doubles
in [0, 1].
Ratio a, when a has finite precision, is a very dangerous and unlawful beast.
Sounds reasonable.
On the other hand, if we decide to give a green light to new instances of Random, we can define instance Random Rational just applying toRational to Doubles in [0, 1].
I like it.
On the other hand, if we decide to give a green light to new instances of Random, we can define instance Random Rational just applying toRational to Doubles in [0, 1].
This would be inappropriate in testing, but there's no reason why random
generators should be used for testing per-se.
As long as this is documented well, I don't really mind, but we might as well just have some functions then, instead of an instance.
On the other hand, if we decide to give a green light to new instances of Random, we can define instance Random Rational just applying toRational to Doubles in [0, 1].
That sounds as reasonable choice for Random instance. BTW it's not necessary to involve Doubles we can just use:
n <- (fromIntegral :: Word64 -> Integer) <$> unform
return $ n % (2^64-1)
There is currently no instances for
Ratio
data type that can give us random valuesI can see a reasonable
Uniform
instance for integral types, which excludes the zero denominator:An efficient version for
UniformRange
needs some thought.Note that neither
Uniform
, norUniformRange
would be able to generateRational
. This means that we can attempt and cook up such instance forRandom