rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.67k stars 432 forks source link

Poisson sample() hangs when lambda is close to max of the float type. #1312

Closed WarrenWeckesser closed 1 month ago

WarrenWeckesser commented 1 year ago

The follow programs appear to hang indefinitely in the call poisson.sample(&mut rng):

f32

use rand_distr::{Distribution, Poisson};

fn main() {
    let lambda = 1.0e37f32;
    let poisson = Poisson::<f32>::new(lambda).unwrap();
    let mut rng = rand::thread_rng();
    let k = poisson.sample(&mut rng);
    println!("{}", k);
}

f64

use rand_distr::{Distribution, Poisson};

fn main() {
    let lambda = 1.0e306f64;
    let poisson = Poisson::<f64>::new(lambda).unwrap();
    let mut rng = rand::thread_rng();
    let k = poisson.sample(&mut rng);
    println!("{}", k);
}

I ran into this issue while testing gh-1296.

dhardy commented 1 year ago

Related to #1290?

WarrenWeckesser commented 1 year ago

The problem is that when $\lambda$ is sufficiently large, the value of magic_val computed here:

https://github.com/rust-random/rand/blob/b593db692daed77e8ca6c7a7a7e8414633abc714/rand_distr/src/poisson.rs#L84

will be nan, because both terms in that expression will overflow to inf, and inf - inf gives nan.

Then down at

https://github.com/rust-random/rand/blob/b593db692daed77e8ca6c7a7a7e8414633abc714/rand_distr/src/poisson.rs#L141-L144

check will be nan, so the comparison in the if statement will be false, and the loop will never break.

When $\lambda$ is that large, terms result * self.log_lambda and crate::utils::log_gamma(F::one() + result) from

https://github.com/rust-random/rand/blob/b593db692daed77e8ca6c7a7a7e8414633abc714/rand_distr/src/poisson.rs#L134-L139

can also overflow, because result will be $\lambda + c\sqrt{\lambda}$, where $c$ is not large. ($\sqrt{\lambda}$ is the standard deviation of the Poisson distribution.)

There is another problem with large $\lambda$, even if it isn't large enough to cause overflow: the terms being subtracted in the expressions shown above will be very close, resulting in significant loss of precision.

I've looked into modifying the calculation when $\lambda$ is large to use an asymptotic version of the check expression, but I don't have anything useful yet.

A different approach to avoiding the problem would be switch to a normal distribution when $\lambda$ is sufficiently large. Convential wisdom is that the normal distribution is a good approximation to the Poisson distribution when $\lambda$ is large, but it would be nice to have a more quantitative result on the quality of the approximation. (John D. Cook has a short blog post about it.)

dhardy commented 3 months ago

Depending on how large a value we are talking about, another solution is simply to return an error in the constructor.

benjamin-lieser commented 3 months ago

I would go for using the normal approximation with lambda bigger than 1e30 The two cdf function have a maximum error of smaller than 1/sqrt(lambda) according to https://www.johndcook.com/blog/normal_approx_to_poisson/ so this is very well justified

numpy does reject lambda bigger than 1e18 R does work for all ranges (or breaks a bit later than ours in my testing). https://github.com/wch/r-source/blob/trunk/src/nmath/rpois.c

benjamin-lieser commented 3 months ago

If we don't care about higher precision than f64 it is actually save to just return lambda if lambda is >= 1e35 The standard deviation is then 1e17.5 which means for a significant digit on the scale of 1e35 it has to be more than 10 sigma away, which basically can't happen.

benjamin-lieser commented 1 month ago

Solved with https://github.com/rust-random/rand/pull/1498 by introducing maximum lambda.