mmaul / clml

Common Lisp Machine Learning Library
Other
259 stars 36 forks source link

Issues with computing quantile function of beta distribution #33

Open TeMPOraL opened 6 years ago

TeMPOraL commented 6 years ago

I've encountered the following issue:

(clml.statistics:quantile (clml.statistics:beta-distribution 0.01 0.01) 0.23)

aborts after second iteration of #'newton-raphson through an assertion in #'regularized-incomplete-beta (see trace below).

However, the same parameters tested on Python (SciPy), R and Wolfram|Alpha Open Code all agree the result should be ~ 1.856702e-34.

I've looked into the code of #'quantile, comaring it with SciPy (btdtri()) and R (qbeta()); it seems to me that the Python/R versions are implemented to handle some corner cases the Lisp code doesn't.

Related, (clml.statistics:quantile (clml.statistics:beta-distribution 0.1 0.9) 0.23) gyrates for a while, and eventually breaks on division by zero (Wolfram suggests the result should be 4.886*10^-7).

Trace:

CL-USER> (trace)
(CLML.STATISTICS.MATH:NEWTON-RAPHSON CLML.STATISTICS:CDF
                                     CLML.STATISTICS:DENSITY
                                     CLML.STATISTICS.MATH:REGULARIZED-INCOMPLETE-BETA)
CL-USER> (clml.statistics:quantile (clml.statistics:beta-distribution 0.01 0.01) 0.23)
  0: (CLML.STATISTICS.MATH:NEWTON-RAPHSON
      #<CLOSURE (LAMBDA (CLML.STATISTICS::X)
                  :IN
                  CLML.STATISTICS:QUANTILE) {1002C1D25B}>
      #<CLOSURE (LAMBDA (CLML.STATISTICS::X)
                  :IN
                  CLML.STATISTICS:QUANTILE) {1002C1D27B}>
      :INITIAL-GUESS 0.059905716600583525)
    1: (CLML.STATISTICS:CDF
        #<CLML.STATISTICS:BETA-DISTRIBUTION : SHAPE = (0.01, 0.01)>
        0.059905716600583525)
      2: (CLML.STATISTICS.MATH:REGULARIZED-INCOMPLETE-BETA 0.01 0.01
                                                           0.059905716600583525)
      2: CLML.STATISTICS.MATH:REGULARIZED-INCOMPLETE-BETA returned
           0.48649456283512205
    1: CLML.STATISTICS:CDF returned 0.48649456283512205
    1: (CLML.STATISTICS:DENSITY
        #<CLML.STATISTICS:BETA-DISTRIBUTION : SHAPE = (0.01, 0.01)>
        0.059905716600583525)
    1: CLML.STATISTICS:DENSITY returned 0.08627940090408258
    1: (CLML.STATISTICS:CDF
        #<CLML.STATISTICS:BETA-DISTRIBUTION : SHAPE = (0.01, 0.01)>
        -2.9129309065960576)
      2: (CLML.STATISTICS.MATH:REGULARIZED-INCOMPLETE-BETA 0.01 0.01
                                                           -2.9129309065960576)
; Evaluation aborted on #<SIMPLE-ERROR "A and B should be nonnegative and X should be less than 1." {1002C39C63}>.
mmaul commented 6 years ago

That is not exactly surprising as I expect SciPy and R versions has had more eyes on it. That said there is no reason why we can fix things when we find things....

TeMPOraL commented 6 years ago

FWIW, I traced SciPy implementation of the beta-quantile function to the Cephes library. In particular, I managed to get the code from this fork working (and giving results in agreement with SciPy) in C and through CFFI, so that would be the implementation to compare to.

The function in question is incbi.