rleonid / oml

OCaml Math Library
Apache License 2.0
119 stars 9 forks source link

Evaluate Random against Dieharder #110

Open rleonid opened 9 years ago

rleonid commented 9 years ago

The source code claims that the sequence of ints generated passes the famous Diehard tests of randomness.

But does it Dieharder?

smondet commented 8 years ago

so far I have:

RNG PASSED FAILED WEAK
URandom 60 1 3
OCaml 63 1 0
OCaml-50%-bias 19 45 0

Where this is the OCaml test:

let () =
  Random.self_init ();
  while true do
    print_char Random.(int 256 |> char_of_int);
    flush stdout;
  done

50% bias means print_char Random.(int 128 |> char_of_int); (for validation purposes),

and /dev/urandom is on:

 $ uname -srpi
Linux 2.6.32-358.el6.x86_64 x86_64 x86_64
 $ cat /etc/lsb-release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
 $ cat /etc/centos-release 
CentOS release 6.4 (Final)

OCaml and urandom fail the same test: -d 201 a.k.a. rgb_minimum_distance:

$ dieharder -d 201 -h
#
#            THE GENERALIZED MINIMUM DISTANCE TEST
#
# This is the generalized minimum distance test, based on the paper of M.
# Fischler in the doc directory and private communications.  This test
# utilizes correction terms that are essential in order for the test not
# to fail for large numbers of trials.  It replaces both
# diehard_2dsphere.c and diehard_3dsphere.c, and generalizes the test
# itself so that it can be run for any d = 2,3,4,5.  There is no
# fundamental obstacle to running it for d = 1 or d>5, but one would need
# to compute the expected overlap integrals (q) for the overlapping
# d-spheres in the higher dimensions.  Note that in this test there is no
# real need to stick to the parameters of Marsaglia.  The test by its
# nature has three controls: n (the number of points used to sample the
# minimum distance) which determines the granularity of the test -- the
# approximate length scale probed for an excess of density; p, the usual
# number of trials; and d the dimension.  As Fischler points out, to
# actually resolve problems with a generator that had areas 20% off the
# expected density (consistently) in d = 2, n = 8000 (Marsaglia's
# parameters) would require around 2500 trials, where p = 100 (the old
# test default) would resolve only consistent deviations of around 1.5
# times the expected density.  By making both of these user selectable
# parameters, dieharder should be able to test a generator pretty much
# as thoroughly as one likes subject to the generous constraints
# associated with the eventual need for still higher order corrections
# as n and p are made large enough.
#
rleonid commented 8 years ago

This is really great work! I'll try to adapt your code to see if it also passes TestU01