better derivative of lgamma and digamma stability

bob-carpenter commented 7 years ago

Summary:

From @bgoodri on http://discourse.mc-stan.org/t/numerical-error-tanks-stepsize-other-possibilities/293/3

For the complete Gamma function, I was thinking the other day that Stan could do better. It is known

https://www.vttoth.com/CMS/projects/41-the-lanczos-approximation1

how to get arbitrary precision for the log-Gamma function, which Boost basically already implements (with some additional edge cases). But in the Stan Math Library, we naively call the digamma function (in doubles) to compute the derivative, which can be less accurate. So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.

Current Version:

v2.14.0

bgoodri commented 7 years ago

Paul Godfrey, who was the guy that really came up with this approximation, said (with reference to the Gamma function, rather than the log-Gamma function) http://my.fit.edu/%7Egabdo/gamma.txt

The best performance was exhibited by g = 607/128 with a relative error less than 10E-15 for real z in the range 0< z <171 ... The 15 coefficients are c = [ 0.99999999999999709182 57.156235665862923517 -59.597960355475491248 14.136097974741747174 -0.49191381609762019978 .33994649984811888699e-4 .46523628927048575665e-4 -.98374475304879564677e-4 .15808870322491248884e-3 -.21026444172410488319e-3 .21743961811521264320e-3 -.16431810653676389022e-3 .84418223983852743293e-4 -.26190838401581408670e-4 .36899182659531622704e-5] for g = 607/128 = 4.7421875

betanalpha commented 7 years ago

I argue that most of these claims are completely unsubstantiated. See, for example, the discussion in Boost’s doc itself: http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/sf_gamma/digamma.html http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/sf_gamma/digamma.html.

On Mar 12, 2017, at 9:53 AM, bgoodri notifications@github.com wrote:

Paul Godfrey, who was the guy that really came up with this approximation, said (with reference to the Gamma function, rather than the log-Gamma function) http://my.fit.edu/%7Egabdo/gamma.txt http://my.fit.edu/%7Egabdo/gamma.txt The best performance was exhibited by g = 607/128 with a relative error less than 10E-15 for real z in the range 0< z <171 ... The 15 coefficients are c = [ 0.99999999999999709182 57.156235665862923517 -59.597960355475491248 14.136097974741747174 -0.49191381609762019978 .33994649984811888699e-4 .46523628927048575665e-4 -.98374475304879564677e-4 .15808870322491248884e-3 -.21026444172410488319e-3 .21743961811521264320e-3 -.16431810653676389022e-3 .84418223983852743293e-4 -.26190838401581408670e-4 .36899182659531622704e-5] for g = 607/128 = 4.7421875

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/505#issuecomment-285946049, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlug3xc6jDslVAJ3HaiwEhcMaRytvks5rk_jwgaJpZM4MadBV.

bgoodri commented 7 years ago

What claims are unsubstantiated? He got that precision for the Gamma function. You linked to the Boost doc for the digamma function. For the log-Gamma function with fixed precision Boost uses slightly different values of g and n http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/lanczos.html .

betanalpha commented 7 years ago

"So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.”

On Mar 12, 2017, at 1:34 PM, bgoodri notifications@github.com wrote:

What claims are unsubstantiated? He got that precision for the Gamma function. You linked to the Boost doc for the digamma function. For the log-Gamma function with fixed precision Boost uses slightly different values of g and n http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/math_toolkit/lanczos.html . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/505#issuecomment-285960198, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlk9gZZOJqGmzZVi45NSc1tK8VdYeks5rlCyVgaJpZM4MadBV.

bgoodri commented 7 years ago

The derivative is not difficult because it is mostly just log() stuff. It is definitely faster than separately calling lgamma and digamma from Boost because some of the digamma calculations are shared with the lgamma calculations. It is possibly more accurate (I doubt by much) because we would be differentiating the approximation being used rather than approximating the true derivative.

On Sun, Mar 12, 2017 at 1:59 PM, Michael Betancourt < notifications@github.com> wrote:

"So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.”

On Mar 12, 2017, at 1:34 PM, bgoodri notifications@github.com wrote:

What claims are unsubstantiated? He got that precision for the Gamma function. You linked to the Boost doc for the digamma function. For the log-Gamma function with fixed precision Boost uses slightly different values of g and n http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/ math_toolkit/lanczos.html . — You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/stan-dev/math/issues/505#issuecomment-285960198>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABdNlk9gZZOJqGmzZVi45NSc1tK8VdYeks5rlCyVgaJpZM4MadBV.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/505#issuecomment-285961909, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqnU3XGErvxMzBFhn8UomZzTWHltxks5rlDJ0gaJpZM4MadBV .

bob-carpenter commented 7 years ago

Approximating the true derivative will give you better precision w.r.t. the true value than differentiating the approximation of the value.

But then the question is given that the approximation of the value is the function being used, maybe HMC is better off differentiating the approximation?

betanalpha commented 7 years ago

The digamma doesn’t even use any transcendental function calls — it’s all rational function approximations and asymptotic expansions, hence the speed. The bigger challenge is that the radii of convergence are different than for the various log gamma approximations, or in other words the optimal branches for evaluation are different. So you run into the typical problems of either (a) lgamma converges to the required precision but the derivative doesn’t so you lose the desired precision or (b) the derivative converges much more quickly and you end up wasting computation.

Regardless of all the technical details, lgamma and digamma in boost are really fast and very precise. There is no way that they are the source of computational problems unless the units are so poorly chosen that we’re running into floating point precision issues anyways.

I went through all of this with the beta CDF — needing to evaluate it, and hence its derivative, at extreme values causing the naive power series expansion to fail. Only by being careful to approximate the ratio of terms instead of the two terms independently and then putting a ton of work into determining the optimal branches for the ratio calculation was I able to get the needed accuracy but then all of the problems went away. The same thing will need to be done for the gamma CDF hence my sharing the notes way at the beginning when Krzysztof first mentioned issues.

On Mar 12, 2017, at 2:50 PM, bgoodri notifications@github.com wrote:

The derivative is not difficult because it is mostly just log() stuff. It is definitely faster than separately calling lgamma and digamma from Boost because some of the digamma calculations are shared with the lgamma calculations. It is possibly more accurate (I doubt by much) because we would be differentiating the approximation being used rather than approximating the true derivative.

On Sun, Mar 12, 2017 at 1:59 PM, Michael Betancourt < notifications@github.com> wrote:

"So, we could try differentiating the Lanczos / Godfrey approximation directly, which is not difficult, definitely faster, and possibly more accurate.”

On Mar 12, 2017, at 1:34 PM, bgoodri notifications@github.com wrote:

What claims are unsubstantiated? He got that precision for the Gamma function. You linked to the Boost doc for the digamma function. For the log-Gamma function with fixed precision Boost uses slightly different values of g and n http://www.boost.org/doc/libs/1_63_0/libs/math/doc/html/ math_toolkit/lanczos.html . — You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/stan-dev/math/issues/505#issuecomment-285960198>, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABdNlk9gZZOJqGmzZVi45NSc1tK8VdYeks5rlCyVgaJpZM4MadBV.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/505#issuecomment-285961909, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqnU3XGErvxMzBFhn8UomZzTWHltxks5rlDJ0gaJpZM4MadBV .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/505#issuecomment-285965403, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlqFaCNmuAiO4AXmoPGT4ihl82v1Xks5rlD5ggaJpZM4MadBV.

stan-dev / math

better derivative of lgamma and digamma stability #505

Summary:

Current Version: