Incorrect formula - Githubissues

vigna commented 4 years ago

The formula reported in the book for the probability of overlap:

1 - e^(-u * n^2 / (2 * p))

is slightly off, the actual (approximated) formula is

1 - e^(-u * n(n-1) / (p-1))

The off-by-ones are not so relevant, but the spurious factor 2 halves the actual probability.

I also think it would be nice to tell the user that if un^2 is much smaller than p, the formula is very well approximated by

un^2 / p

(see http://prng.di.unimi.it/#remarks) as it is much easier to understand.

dhardy commented 4 years ago

I think both formulae are wrong? If you take a corner case like nL = P the probability of no overlap, there are P × n! / n = L · n! states with no overlap of P^n states total.

For the first unused value, there are n+1 positions it could lie; for the second, there are n+2 etc. (x! / n!) — but now with (x-n)! combinations of the x = P - nL unused values, i.e.

prob(no overlap) = (L · n! / P^n) · x! / (n! · (x-n)!)
 = L · x! / ((x-n)! · P^n)
 = L · (P - nL)! / (P - n(L + 1))! · P^-n

Where n is comparatively small, this is approximately:

L · (P - nL) ^ n · P^-n
 = L · (1 - nL / P)^n

~Aha, this is approximately your formula!~

vigna commented 4 years ago

Since both formulae are approximated, technically they're both wrong.

Since your approximation is very far from reality (technically: it is asymptotically wrong), probably it would be better to replace it with an approximation closer to reality (technically: one that is asymptotically right).

Does that sound better?

Note that your "formula" L · (1 - nL / P)^n gives values much greater than 1 even for n = 2 (try L=1000, P=1000000).

rust-random / book

Incorrect formula #19