stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
752 stars 188 forks source link

inv_Phi() can be faster #2555

Closed spinkney closed 3 years ago

spinkney commented 3 years ago

I don't know how permissive it is to take R source code and modify it for our use...but modifying R's qnorm function and adding it to Stan is about 2x faster than calling the inv_Phi function.

 vector inv_Phi_stan(vector x) {
    return inv_Phi(x);
  }

 real qnorm_stan(real p) {
    real r;
    real val;
    real q = p - 0.5;

    if (fabs(q) <= .425) {
      r = .180625 - q * q;
        val = q * (((((((r * 2509.0809287301226727 +
                       33430.575583588128105) * r + 67265.770927008700853) * r +
                     45921.953931549871457) * r + 13731.693765509461125) * r +
                   1971.5909503065514427) * r + 133.14166789178437745) * r +
                 3.387132872796366608) / (((((((r * 5226.495278852854561 +
                     28729.085735721942674) * r + 39307.89580009271061) * r +
                   21213.794301586595867) * r + 5394.1960214247511077) * r +
                 687.1870074920579083) * r + 42.313330701600911252) * r + 1.0);
    } else { /* closer than 0.075 from {0,1} boundary */
        if (q > 0) r = 1.0 - p;
          else r = p;

        r = sqrt(-log(r));

      if (r <= 5.) { /* <==> min(p,1-p) >= exp(-25) ~= 1.3888e-11 */
        r += -1.6;
        val = (((((((r * 0.00077454501427834140764 +
                       .0227238449892691845833) * r + .24178072517745061177) *
                     r + 1.27045825245236838258) * r +
                    3.64784832476320460504) * r + 5.7694972214606914055) * r + 4.6303378461565452959) * r +
                 1.42343711074968357734) / (((((((r *
                         0.00000000105075007164441684324 + 0.0005475938084995344946) *
                        r + .0151986665636164571966) * r +
                       .14810397642748007459) * r + .68976733498510000455) *
                     r + 1.6763848301838038494) * r +
                    2.05319162663775882187) * r + 1.);
        } else { /* very close to  0 or 1 */
            r += -5.;
            val = (((((((r * 0.000000201033439929228813265 +
                       0.0000271155556874348757815) * r +
                      .0012426609473880784386) * r + .026532189526576123093) *
                    r + .29656057182850489123) * r +
                   1.7848265399172913358) * r + 5.4637849111641143699) *
                 r + 6.6579046435011037772) / (((((((r *
                        0.00000000000000204426310338993978564 + 0.00000014215117583164458887)*
                        r + 0.000018463183175100546818) * r +
                       0.0007868691311456132591) * r + .0148753612908506148525)
                     * r + .13692988092273580531) * r +
                    .59983220655588793769) * r + 1.);
        }

    if(q < 0.0) val = -val;
  }
    return val;
}

 vector qnorm_stan_vec (vector p) {
   int N = num_elements(p);
   vector[N] out;

   for (n in 1:N) 
     out[n] = qnorm_stan(p[n]);

   return out;
 }

Running

library(rstan)
library(microbenchmark)

mod <- stan_model('~/inv_phi_test.stan')
expose_stan_functions(mod)

p <- runif(1000)

time = microbenchmark::microbenchmark(
     inv_Phi_stan(p),
     qnorm(p),
     qnorm_stan_vec(p),
     times = 1000
  )
time
time
Unit: microseconds
              expr   min    lq     mean  median     uq    max neval
   inv_Phi_stan(p) 52.43 53.05 60.37816 53.3550 69.540 184.70  1000
          qnorm(p) 32.81 33.12 40.49271 33.3300 52.430 249.98  1000
 qnorm_stan_vec(p) 21.88 22.57 25.93278 22.8800 29.610 156.38  1000

Looking at the results seems good

get_invphi = function(x) {
  data.frame(
    x = x,
    invPhi_old = inv_Phi_stan(x),
    invPhi_R = qnorm(x),
    invPhi_new = qnorm_stan(x)
  )
}
> purrr::map_dfr(seq(from = 0, to = 1, by = 0.001),get_invphi)
        x invPhi_old   invPhi_R invPhi_new
1   0.000       -Inf       -Inf        NaN
2   0.001 -3.0902323 -3.0902323 -3.0902323
3   0.002 -2.8781617 -2.8781617 -2.8781617
4   0.003 -2.7477814 -2.7477814 -2.7477814
5   0.004 -2.6520698 -2.6520698 -2.6520698
6   0.005 -2.5758293 -2.5758293 -2.5758293
7   0.006 -2.5121443 -2.5121443 -2.5121443
8   0.007 -2.4572634 -2.4572634 -2.4572634
9   0.008 -2.4089155 -2.4089155 -2.4089155
10  0.009 -2.3656181 -2.3656181 -2.3656181
11  0.010 -2.3263479 -2.3263479 -2.3263479
...
bob-carpenter commented 3 years ago

Unfortunately, R is distributed under the GPL license. It's not legal to include R code and release Stan under BSD. The other way around is fine---including BSD code as part of a GPL project. See the Wikipedia page on copyleft.

Here's what the doc for qnorm says:

For qnorm, the code is a C translation of Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484. which provides precise results up to about 16 digits.

So we could just write C++ from that algorithm spec and we should be fine.

But you'll also see this is only giving you 16 digits of accuracy. I don't know precise the C++ function std::erf is, but the standard library tends to select higher precision over fast algorithms. The Eigen matrix lib also provides faster, but less accurate versions of library functions. That's also the trick Intel uses to get more speed out of their math library.

You could try using std::erfl to compute a reference answer to 16 decimal places with long double types and see how Wichura's algorithm compares to std::erf.

spinkney commented 3 years ago

Does the C++ standard library have the inverse error function? All I see is the regular std::erf functions. Or are you suggesting testing on erf(inv_erf(x)) and seeing how close to x it is?

What amount of accuracy does Stan math require?

The inv_Phi() function in Stan math is from Peter Acklam unfortunately the link in the comments doesn't work but can be found at https://web.archive.org/web/20151030215612/http://home.online.no/~pjacklam/notes/invnorm/. He says on the site, "The absolute value of the relative error is less than 1.15 × 10−9 in the entire region."

bob-carpenter commented 3 years ago

Sorry---forgot the inverse! No, the C++ standard library doesn't have the inverse normal cdf.

What amount of accuracy does Stan math require?

As much as it can get? Seriously, it's a judgement call. The problem is that you only need increased precision in some circumstances, but in those circumstances, you usually really need it. But it doesn't sound like that inverse normal cdf is that accurate, which is not atypical of double-precision functions (getting a bit more than single-precision accuracy).

I'd like to hear what @bgoodri thinks about this particular case.

bgoodri commented 3 years ago

We are soon going to be officially awarded a NSF grant to, in part, address inverse CDF (a.k.a. quantile) functions in Stan. Part of that will entail using the quantile functions that are already in Boost. So, you might try to one for the inverse standard normal, which internally uses

https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/sf_erf/error_inv.html

Part of the grant will also entail using the unidimensional root-finding functions in Boost to evaluate the quantile function by numerically inverting the CDF. Since the derivatives are readily available in this case, one could use

https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/roots_deriv.html

Although that may entail having a faster and / or more accurate function to evaluate the standard normal CDF, such as that in

https://www.jstatsoft.org/article/view/v011i04

Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical. The replication archive has some other code from Sun etc. that has a BSD compatible license.

Ben

On Wed, Aug 4, 2021 at 3:12 PM Bob Carpenter @.***> wrote:

Sorry---forgot the inverse! No, the C++ standard library doesn't have the inverse normal cdf.

What amount of accuracy does Stan math require?

As much as it can get? Seriously, it's a judgement call. The problem is that you only need increased precision in some circumstances, but in those circumstances, you usually really need it. But it doesn't sound like that inverse normal cdf is that accurate, which is not atypical of double-precision functions (getting a bit more than single-precision accuracy).

I'd like to hear what @bgoodri https://github.com/bgoodri thinks about this particular case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2555#issuecomment-892905500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKV22M642CH7H2NSTJ3T3GGLLANCNFSM5BRRKBSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

bob-carpenter commented 3 years ago

Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical.

The code in the paper is Fortran. We can't just include the C source code from R, because it's GPL-ed.

I do not believe there's a legal issue with reimplementing the algorithm in C++ from the paper.

For a better read on this, we can ask NumFOCUS's IP lawyers, though our devs haven't always believed the lawyers' responses in the past.

bgoodri commented 3 years ago

First line of the abstract: "This article provides a little table-free C function that evaluates the normal distribution ..."

On Wed, Aug 4, 2021 at 4:03 PM Bob Carpenter @.***> wrote:

Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical.

The code in the paper is Fortran. We can't just include the C source code from R, because it's GPL-ed.

I do not believe there's a legal issue with reimplementing the algorithm in C++ from the paper.

For a better read on this, we can ask NumFOCUS's IP lawyers, though our devs haven't always believed the lawyers' responses in the past.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2555#issuecomment-892936499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTILKKPT3F7EUO7MSTT3GMH3ANCNFSM5BRRKBSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

bob-carpenter commented 3 years ago

What does "table-free" mean? The code has stuff like this in it, which looks like a table that's been inlined.

q * (((((((r * 2509.0809287301226727 +
                       33430.575583588128105) * r + 67265.770927008700853) * r +
                     45921.953931549871457) * r + 13731.693765509461125) * r +
                   1971.5909503065514427) * r + 133.14166789178437745) * r +
                 3.387132872796366608) / (((((((r * 5226.495278852854561 +
                     28729.085735721942674) * r + 39307.89580009271061) * r +
                   21213.794301586595867) * r + 5394.1960214247511077) * r +
                 687.1870074920579083) * r + 42.313330701600911252) * r + 1.0);
spinkney commented 3 years ago

closed with #2566

bob-carpenter commented 3 years ago

@spinkney: Thanks for closing the issue. If you had included the string "fixes #2555" in one of the commas for the PR, the issue would've been auto-closed.

I hadn't realized until checking now that the reverse-mode version just calls this function for the value, so this'll also speed up the autodiff version.