Open yaxu opened 2 years ago
@yaxu I don't fully remember why I settled on 300 ages ago... but doing a few quick tests now, if you stretch the algorithm over 300000 or 3000000 cycles, and then sample at 1/16 of a cycle, there is a clear pattern of cyclical behaviour. Here's at 300000:
*Test> fmap timeToRand $ fmap (/16) [0..64]
[0.0,5.2969591692090034e-2,0.10644272342324257,0.15890990756452084,0.21238190680742264,0.2804735545068979,0.31832335516810417,0.3859429322183132,0.4247638750821352,0.4928864538669586,0.5609471704810858,0.613443735986948,0.6366466525942087,0.6891457810997963,0.7723894044756889,0.8405124042183161,0.8490547277033329,0.9171469118446112,0.9862459301948547,7.409052923321724e-3,0.12189428322017193,0.1744072400033474,0.22688753344118595,0.29502829164266586,0.27329336665570736,0.3414011038839817,0.37829150445759296,0.44775657169520855,0.5443057864904404,0.6006928700953722,0.6814978308975697,0.737887954339385,0.6976059153676033,0.7622828707098961,0.8342937659472227,0.8985254727303982,0.9724919218569994,8.391769230365753e-2,1.4818167313933372e-2,0.1888825185596943,0.24331554397940636,0.17519287578761578,0.3492875024676323,0.29679072834551334,0.4533020444214344,0.40080306492745876,0.5905296057462692,0.5219334363937378,0.5465866755694151,0.7206516228616238,0.6828022692352533,0.8725522384047508,0.7565830703824759,0.9524216447025537,0.8960166834294796,9.76302158087492e-2,8.810803294181824e-2,1.6092436388134956e-2,0.2018892802298069,0.12987672351300716,0.36299572326242924,0.3066075071692467,0.47577585093677044,0.4193868599832058,0.39521189220249653]
30000 looks "better" - although if one were to sample the algorithm ten times faster I guess the same issue would appear:
*Test> fmap timeToRand $ fmap (/16) [0..64]
[0.0,5.2969591692090034e-2,0.10644272342324257,0.15890990756452084,0.21238190680742264,0.2804735545068979,0.31832335516810417,0.3859429322183132,0.4247638750821352,0.4928864538669586,0.5609471704810858,0.613443735986948,0.6366466525942087,0.6891457810997963,0.7723894044756889,0.8405124042183161,0.8490547277033329,0.9171469118446112,0.9862459301948547,7.409052923321724e-3,0.12189428322017193,0.1744072400033474,0.22688753344118595,0.29502829164266586,0.27329336665570736,0.3414011038839817,0.37829150445759296,0.44775657169520855,0.5443057864904404,0.6006928700953722,0.6814978308975697,0.737887954339385,0.6976059153676033,0.7622828707098961,0.8342937659472227,0.8985254727303982,0.9724919218569994,8.391769230365753e-2,1.4818167313933372e-2,0.1888825185596943,0.24331554397940636,0.17519287578761578,0.3492875024676323,0.29679072834551334,0.4533020444214344,0.40080306492745876,0.5905296057462692,0.5219334363937378,0.5465866755694151,0.7206516228616238,0.6828022692352533,0.8725522384047508,0.7565830703824759,0.9524216447025537,0.8960166834294796,9.76302158087492e-2,8.810803294181824e-2,1.6092436388134956e-2,0.2018892802298069,0.12987672351300716,0.36299572326242924,0.3066075071692467,0.47577585093677044,0.4193868599832058,0.39521189220249653]
So I guess 300 was probably something like a number that was sure to push this potential for short-term cyclic behaviour far out of salience?
Another thought might be to make this user configurable - a parameter that is read from the environment. If another parameter, some kind of offset,was also added so that there were two parameters controlling the generator that might retain the blinding speed of this algorithm, while also adding a somewhat more intuitive UI for having multiple random streams, while also addressing this issue (when it is an issue).
Re: the negative time issue... timeToRand seems to work fine with negative times, so I suspect it is something to do with the way it is currently integrated into rand:
rand = Pattern (\(State a@(Arc s e) _) -> [Event (Context []) Nothing a (realToFrac $ (timeToRand ((e + s)/2) :: Double))])
I am a bit suspicious of that double conversion in the middle of things there. Could that be doing weird things, and might changing it to Rational help?
I did a bit of digging last weekend
and I found repeating patterns everywhere,
on multiple scales.
My final conclusion is
that xorshift
is not suitable
for the way we are trying to use it.
The problem is
that a random numbers based on xorshift
are meant to be generated
by reappying xorshift
consecutively, i.e.,
rnd = seed : xorshift seed : xorshift (xorshift seed) : xorshift^3 seed ...
while what we do is something like
rnd = [xorshift seed1, xorshift seed2, xorshift seed3, ...]
Unfortunately the xorshift
does not scramble the bits enough
so when using similar numbers as seeds,
actually similar numbers come out after the first iteration of xorshift
.
My suggestion is to use a hash function to randomise the time seed
instead of xorshift
.
I looked around a bit and found murmur3 to be a suitable candidate
it is simple and fast.
It has a few more operations than xorshift
but I think it should be still simple enough
to not cause a performance bottleneck.
I tested the generated numbers a bit
by plotting different samplings of it
and it seems fine to me.
The implementation that is currently in pull request #942 has a period of 2^16 = 65536 cycles and divides a cycle into 2^16 values, this should be enough I think (if someone were to play 10 cycles per second it would still only repeat after almost 2 hours).
This looks great to me. Thanks @fbous !!!
I've noticed that
rand
has a period of 300 cycles.I haven't heard of anyone noticing it loop, but it's feasible that someone could under some conditions.
@dktr0 do you see an advantage in setting at that, rather than something a few orders of magnitude higher?
Also what looks more like a bug, negative time gives negative numbers: