haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

Faulty values calling BinomialDistribution's RNG #692

Closed jaltevogt closed 2 years ago

jaltevogt commented 2 years ago

For carefully selected success rate p ~=0.999000999000999 the binomial distributions random number generator seems bugged. Maybe some numerical instabilities?

Code snippet:

int n=1000;
List<Double> pList = new ArrayList<>();
pList.add(0.999);
pList.add(0.99900099900099);
pList.add(0.999000999000999);
pList.add(0.9990009999);
pList.add(0.999001);
pList.add(0.999002);
pList.add(0.999006);
pList.add(0.9991);
pList.add(0.9992);
for (double p : pList) {
    BinomialDistribution distb = new BinomialDistribution(n,p);
    double rn = distb.rand();
    if (rn<900 || rn>1000) {
        System.out.println(rn + " for n="+n+ " and p="+p);
    }
}

results in:

-2.147482648E9 for n=1000 and p=0.999000999000999
469955.0 for n=1000 and p=0.9990009999
289762.0 for n=1000 and p=0.999001
1286.0 for n=1000 and p=0.999002
1153.0 for n=1000 and p=0.999006
1002.0 for n=1000 and p=0.9991

Can you reproduce this?

haifengl commented 2 years ago

It is caused by underflow/overflow. Those p are too large (very close to 1). Although we can harden the code to throw exceptions in such case, it may not be very useful for random number generations.

haifengl commented 2 years ago

A workaround is to use a large n (e.g. 10x) and then scale down the random number by dividing 10.

jaltevogt commented 2 years ago

Thanks, scaling down works. I'm not familiar with the inner workings of the rng, but I just realized BinomialDistribution.rand() uses PoissonDistribution.tinyLambdaRand() as a workaround in case n*p < 1.E-6 . Should this workaroun be used in case p/n<1.E-6 or (1-p)/n < 1.E-6 .

haifengl commented 2 years ago

I made some enhancement. It works fine with your examples. Please try master branch.

jaltevogt commented 2 years ago

Thanks for the fix!