mathnet / mathnet-numerics

Math.NET Numerics
http://numerics.mathdotnet.com
MIT License
3.51k stars 897 forks source link

Error in sampling from Negative Binomial #320

Open Mlambrechts opened 9 years ago

Mlambrechts commented 9 years ago

I'm using Math.Net to sample values from an overdispersed Poisson distribution. I'm doing this using the negative binomial link, as described here: https://stat.ethz.ch/pipermail/r-help/2002-June/022425.html

My code currently looks like this:

private static double ODPoisson(double lambda, double dispersion)
{
    double p = 1 / (dispersion - 1);
    double r = lambda * p;

    if (dispersion == 1)
    {
        return Poisson.Sample(lambda);
    }
    else
    {
        return NegativeBinomial.Sample(r, p);
    }
}

What I've found is that this works well for low values of lambda. As soon as I try to sample with a lambda of 1000 and a dispersion parameter of 2, the code simply 'hangs', i.e. method keeps running but no value is returned. I've even looped through this method to test various combinations of input parameters (lambda from 1 to 1000, dispersion = 2), and the code 'hangs' at different combinations every time. Sometimes it'll sample for all combinations up to lambda = 750, other times up to lambda = 500. This happens by simply re-running the console app and making no code changes.

I've included the 'IsValidParameterSet' check before every run, and even though the parameters are considered valid, the sample is still not generated. To further test whether the input parameters are valid, I've tested the same parameters with the NegativeBinomial.CDF method at the 50th percentile, and it returns a value every time.

Is there an error in the NegativeBinomial.Sample method? If not, how do I fix this problem?

cdrnet commented 9 years ago

Thanks for reporting this.

The current implementation is indeed not suitable for these parameters, as the acceptance area is not only getting very small, it is actually getting to zero (floating point precision) resulting in an infinite loop

Tasks:

jjdelvalle commented 7 years ago

I noticed this is still an issue.

How about using the efficient methods described here: http://link.springer.com/article/10.1007/BF02293108

They may not be as accurate as the one you're using now (they may be though, not quite sure) but I think they'd be a good replacement or at very least good fallback methods for this kinda case.

cdrnet commented 7 years ago

Related: #455

Arlofin commented 2 years ago

There are several issues here:

  1. The code provided by @Mlambrechts is wrong, parameters r and p need to be determined as double p = 1 / dispersion and double r = lambda / (dispersion - 1)
  2. p = 0 actually should be considered to be an invalid parameter, since the distribution is not defined in this case (asymptotically, it reaches infinity, which cannot be represented as integer). In the current implementation, p = 0 not only leads to unhandled undefined mathematical operations (like division by zero), but also to an endless loop in the sampling method. I fixed this in #960 by disallowing p = 0. Note that the run time for sampling still goes to infinity when p approaches zero. Further improvements are welcome.