Closed spinkney closed 3 years ago
Unfortunately, R is distributed under the GPL license. It's not legal to include R code and release Stan under BSD. The other way around is fine---including BSD code as part of a GPL project. See the Wikipedia page on copyleft.
Here's what the doc for qnorm
says:
For qnorm, the code is a C translation of Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484. which provides precise results up to about 16 digits.
So we could just write C++ from that algorithm spec and we should be fine.
But you'll also see this is only giving you 16 digits of accuracy. I don't know precise the C++ function std::erf
is, but the standard library tends to select higher precision over fast algorithms. The Eigen matrix lib also provides faster, but less accurate versions of library functions. That's also the trick Intel uses to get more speed out of their math library.
You could try using std::erfl
to compute a reference answer to 16 decimal places with long double
types and see how Wichura's algorithm compares to std::erf
.
Does the C++ standard library have the inverse error function? All I see is the regular std::erf
functions. Or are you suggesting testing on erf(inv_erf(x))
and seeing how close to x
it is?
What amount of accuracy does Stan math require?
The inv_Phi()
function in Stan math is from Peter Acklam unfortunately the link in the comments doesn't work but can be found at https://web.archive.org/web/20151030215612/http://home.online.no/~pjacklam/notes/invnorm/. He says on the site, "The absolute value of the relative error is less than 1.15 × 10−9 in the entire region."
Sorry---forgot the inverse! No, the C++ standard library doesn't have the inverse normal cdf.
What amount of accuracy does Stan math require?
As much as it can get? Seriously, it's a judgement call. The problem is that you only need increased precision in some circumstances, but in those circumstances, you usually really need it. But it doesn't sound like that inverse normal cdf is that accurate, which is not atypical of double-precision functions (getting a bit more than single-precision accuracy).
I'd like to hear what @bgoodri thinks about this particular case.
We are soon going to be officially awarded a NSF grant to, in part, address inverse CDF (a.k.a. quantile) functions in Stan. Part of that will entail using the quantile functions that are already in Boost. So, you might try to one for the inverse standard normal, which internally uses
https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/sf_erf/error_inv.html
Part of the grant will also entail using the unidimensional root-finding functions in Boost to evaluate the quantile function by numerically inverting the CDF. Since the derivatives are readily available in this case, one could use
https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/roots_deriv.html
Although that may entail having a faster and / or more accurate function to evaluate the standard normal CDF, such as that in
https://www.jstatsoft.org/article/view/v011i04
Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical. The replication archive has some other code from Sun etc. that has a BSD compatible license.
Ben
On Wed, Aug 4, 2021 at 3:12 PM Bob Carpenter @.***> wrote:
Sorry---forgot the inverse! No, the C++ standard library doesn't have the inverse normal cdf.
What amount of accuracy does Stan math require?
As much as it can get? Seriously, it's a judgement call. The problem is that you only need increased precision in some circumstances, but in those circumstances, you usually really need it. But it doesn't sound like that inverse normal cdf is that accurate, which is not atypical of double-precision functions (getting a bit more than single-precision accuracy).
I'd like to hear what @bgoodri https://github.com/bgoodri thinks about this particular case.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2555#issuecomment-892905500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKV22M642CH7H2NSTJ3T3GGLLANCNFSM5BRRKBSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical.
The code in the paper is Fortran. We can't just include the C source code from R, because it's GPL-ed.
I do not believe there's a legal issue with reimplementing the algorithm in C++ from the paper.
For a better read on this, we can ask NumFOCUS's IP lawyers, though our devs haven't always believed the lawyers' responses in the past.
First line of the abstract: "This article provides a little table-free C function that evaluates the normal distribution ..."
On Wed, Aug 4, 2021 at 4:03 PM Bob Carpenter @.***> wrote:
Bob would be happier if you did something based on the code in the paper, as opposed to the code in the GPL'ed replication archive, even though those are identical.
The code in the paper is Fortran. We can't just include the C source code from R, because it's GPL-ed.
I do not believe there's a legal issue with reimplementing the algorithm in C++ from the paper.
For a better read on this, we can ask NumFOCUS's IP lawyers, though our devs haven't always believed the lawyers' responses in the past.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2555#issuecomment-892936499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTILKKPT3F7EUO7MSTT3GMH3ANCNFSM5BRRKBSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
What does "table-free" mean? The code has stuff like this in it, which looks like a table that's been inlined.
q * (((((((r * 2509.0809287301226727 +
33430.575583588128105) * r + 67265.770927008700853) * r +
45921.953931549871457) * r + 13731.693765509461125) * r +
1971.5909503065514427) * r + 133.14166789178437745) * r +
3.387132872796366608) / (((((((r * 5226.495278852854561 +
28729.085735721942674) * r + 39307.89580009271061) * r +
21213.794301586595867) * r + 5394.1960214247511077) * r +
687.1870074920579083) * r + 42.313330701600911252) * r + 1.0);
closed with #2566
@spinkney: Thanks for closing the issue. If you had included the string "fixes #2555" in one of the commas for the PR, the issue would've been auto-closed.
I hadn't realized until checking now that the reverse-mode version just calls this function for the value, so this'll also speed up the autodiff version.
I don't know how permissive it is to take R source code and modify it for our use...but modifying R's
qnorm
function and adding it to Stan is about 2x faster than calling theinv_Phi
function.Running
Looking at the results seems good