upiterbarg / mpmath

Automatically exported from code.google.com/p/mpmath
Other
0 stars 0 forks source link

Loss of precision when calculation hypergeometric distribution with binomial #185

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am implementing a hypergeomtric distribution in Python using mpmath to 
reproduce the output of phyper in R.
My code look like this:
def mhyper(self,q,m,n,k):
        """
        loc.mhyper(q,m,n,k)
        Calculate p-value using binomial function from mpmath
        """
        array = numpy.zeros(1,dtype='Float64')
        for i in xrange(q,min(nSize,n)):
            array[0]+=(binomial(m,i)*binomial(n,k-i))/binomial(n+m,k)
        return array[0]

Nevertheless when I compare the result with R the result is loosing 
precision with some numbers. Here are some examples, the first line are the 
parameters, 2nd R phyper and 3rd mpmath:
4 24 480 140
0.844578810917
0.844578810326
###########################################################################
0 24 480 64
0.964621488766
0.964621488766
###########################################################################
1 24 480 93
0.955579097831
0.955579097831
###########################################################################
7 26 520 201
0.804541841026
0.804538365798
###########################################################################
2 26 520 124
0.958341587794
0.958341587621
###########################################################################
4 26 520 151
0.890202111167
0.890202100093
###########################################################################
0 26 520 72
0.976954090453
0.976954090453
###########################################################################
1 26 520 101
0.969094038618
0.969094038616
###########################################################################

I am using Python 2.6.4 and mpath 0.14 under Ubuntu 9.10. 

The mpmath is also taking a lot more time than using rpy2 has an interface 
to R. Is there anyway to speed up things? 

Thanks in advance,
Bruno 

Original issue reported on code.google.com by bacmsan...@gmail.com on 9 Feb 2010 at 12:00

GoogleCodeExporter commented 9 years ago
What is nSize? I can't reproduce the function. For example, with q,m,n,k =
4,24,480,140 I can't reproduce the value 0.8445788...:

>>> for nSize in range(15): mhyper(None,4,24,480,140)
...
0.0
0.0
0.0
0.0
0.0
0.092590799653249331
0.23858904606301057
0.41897705282651615
0.59810256633560444
0.74357770500085185
0.8413947991941032
0.89631211064831429
0.92219905880854847
0.9324766184829596
0.93591665588977924

Original comment by fredrik....@gmail.com on 9 Feb 2010 at 2:11

GoogleCodeExporter commented 9 years ago
Sorry I forgot you needed that bit. nSize=21

Original comment by bacmsan...@gmail.com on 9 Feb 2010 at 2:57

GoogleCodeExporter commented 9 years ago
>>> nSize = 21
>>> mhyper(None,4,24,480,140)
0.93716960997917065

Assuming 4 24 480 140 = q,m,n,k.

What am I doing wrong?

Original comment by fredrik....@gmail.com on 9 Feb 2010 at 3:01

GoogleCodeExporter commented 9 years ago
Again my fault. 
I am printing the parameters as if they were being used for phyper. So for the 
4 24 480 
140 = q,m,n,k you actually have to do 5 24 480 140:

>>> mhyper(None,5,24,480,140)
0.8445788103259213

Original comment by bacmsan...@gmail.com on 9 Feb 2010 at 3:08

GoogleCodeExporter commented 9 years ago
Ok, thanks. Now, if I try one of the examples with large error (7 26 520 201):

>>> from mpmath import *
>>> q,m,n,k = 8,26,520,201
>>> print sum(binomial(m,i)*binomial(n,k-i)/binomial(n+m,k) for i in 
xrange(q,21))
0.804538365798353

This seems accurate to me.

Also Mathematica gives:

In[29]:= Sum[B[m,i] B[n,k-i] / B[n+m,k], {i, q, 20}]

         148981954195755770507070665875672075
Out[29]= ------------------------------------
         185176941870160889963934431310666956

In[30]:= N[%,50]

Out[30]= 0.80453836579835256115310153235447419254469779811393

Are you sure it's not R that's losing precision?

As for speed, you could determine the ratio between successive terms (the ratio
should be a rational function) instead of computing the binomial coefficients 
from
scratch. Alternatively, you could express the sum using a generalized 
hypergeometric
function, for example like this:

from mpmath import *

def mhyper1(q,m,n,k,nSize):
    return sum(binomial(m,i)*binomial(n,k-i)/binomial(n+m,k) for i in
xrange(q,min(nSize,n)))

def mhyper2(q,m,n,k,nSize):
    u = min(nSize,n)-1
    A = binomial(m,q)*binomial(n,k-q)
    B = hyper([1,q-k,q-m],[1+q,1-k+n+q], 1)
    C = -binomial(m,1+u)*binomial(n,k-1-u)
    D = hyper([1,1-k+u,1-m+u],[2+u,2-k+n+u], 1)
    return (A*B + C*D) / binomial(m+n, k)

Then,

>>> q,m,n,k,nSize = 5,24,480,140,21
>>> print mhyper1(q,m,n,k,nSize)
0.844578810325921
>>> print mhyper2(q,m,n,k,nSize)
0.844578810325921
>>>
>>> 1/timing(mhyper1,q,m,n,k,nSize)
85.24755894260538
>>> 1/timing(mhyper2,q,m,n,k,nSize)
692.77046642154176

so it's 7x faster or so.

Original comment by fredrik....@gmail.com on 9 Feb 2010 at 4:03

GoogleCodeExporter commented 9 years ago
I actually assumed that R was giving me the correct result and that mpmath was 
the one 
loosing precision. I always tend to trust more in R but that might be the case. 
I will 
double check the value with rpy2 compared to R to see if the problem is only 
with rpy2. 

I am not a mathematician so I am afraid I have no idea what you are talking 
about when 
you say to calculate the ration between terms. And I am still trying to get 
what are 
you exactly doing in your functions. 

Original comment by bacmsan...@gmail.com on 9 Feb 2010 at 4:48

GoogleCodeExporter commented 9 years ago
Any sum of this form over binomial coefficients can be expressed in closed form 
using
the generalized hypergeometric function.

I computed the formula using Mathematica like this:

In[1]:= B = Binomial
Out[1]= Binomial
In[2]:= Sum[B[m,i]B[n,k-i]/B[n+m,k], {i,q,u}]
Out[2]= (Binomial[m, q] Binomial[n, k - q]
>       HypergeometricPFQ[{1, -k + q, -m + q}, {1 + q, 1 - k + n + q}, 1] -
>      Binomial[m, 1 + u] Binomial[n, -1 + k - u]
>       HypergeometricPFQ[{1, 1 - k + u, 1 - m + u}, {2 + u, 2 - k + n + u},
>        1]) / Binomial[m + n, k]

Closing this issue since there doesn't seem to be a bug.

Original comment by fredrik....@gmail.com on 9 Feb 2010 at 4:58