stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
745 stars 187 forks source link

binomial_coefficient_log produces wrong results when N >> n #1592

Closed martinmodrak closed 4 years ago

martinmodrak commented 4 years ago

Description

For some values, the results from binomial_coefficient_log don't match those computed by R or Mathematica.

In #239, it is noted that we don't know, where the approximation we use for the case when N >> n comes from. I also don't understand it. R uses a quite a different approximation - their implementation is

double attribute_hidden lfastchoose(double n, double k)
{
    return -log(n + 1.) - lbeta(n - k + 1., k + 1.);
}

Stirling approximation produce good results for some of the cases, but also terrible results for others... The R's implementation seems to work neatly, but depends on quite a bunch of other code to compute the lbeta.

Example

In Stan:

binomial_coefficient_log(1e20, 100) = 4141.4308104325282
binomial_coefficient_log(845000, 8.45e-10) =  1.1925051146203358e-008

Expected Output

Compare to Mathematica, Rs lchoose and Stirling approximation implemented in R via

lchoose_stirling <- function(n, k) {
  -(0.5) * (log(2) + log(pi) + log(k)) + k * (log(n) + 1 - log(k))
}

In both cases, there is a mismatch in second digit. https://www.wolframalpha.com/input/?i=log%2810%5E20+choose+100%29

Stan:  4141.4308104325282
Math:  4241.430810432527877842402915999072777157961556194658677330
R:     4241.4308104325273
Stir:  4241.4316437630832

https://www.wolframalpha.com/input/?i=log%28845000+choose+845*10%5E-12%29

Stan:  1.1925051146203358e-008
Math:  1.2019540397111151911538109607610600259536643522634943 × 10^-8
R:     0 #Not great
R-cor: 1.2019542694474694e-08
Stir:  9.5269037411112816 #Even worse

Here R-cor is computed using the lfastchoose formula directly, e.g.:

lchoose_lbeta <- function(n, k) {
  - log1p(n) -lbeta(n - k + 1, k + 1) 
}

for some reason, lchoose calls a different computation method for those parameters

Current Version:

v3.0.0

martinmodrak commented 4 years ago

This also suggests that binomial and beta densities could be subject to imprecisions :-(

bob-carpenter commented 4 years ago

Thanks for catching and debugging this. We should definitely fix. We can use any approximation that works.

On Jan 7, 2020, at 8:01 AM, Martin Modrák notifications@github.com wrote:

Description

For some values, the results from binomial_coefficient_log don't match those computed by R or Mathematica.

In #239, it is noted that we don't know, where the approximation we use for the case when N >> n comes from. I also don't understand it. R uses a quite a different approximation - their implementation is

double attribute_hidden lfastchoose(double n, double k) { return -log(n + 1.) - lbeta(n - k + 1., k + 1.); }

Stirling approximation also produces the correct results in the cases I've found.

Example

In Stan:

binomial_coefficient_log(1e20, 100) = 4141.4308104325282 binomial_coefficient_log(845000, 8.45e-10) = 1.1925051146203358e-008

Expected Output

Compare to Mathematica, Rs lchoose https://www.wolframalpha.com/input/?i=log%2810%5E20+choose+100%29

Stan: 4141.4308104325282 Math: 4241.430810432527877842402915999072777157961556194658677330 R: 4241.4308104325273

https://www.wolframalpha.com/input/?i=log%28845000+choose+845*10%5E-12%29

Stan: 1.1925051146203358e-008 Math: 1.2019540397111151911538109607610600259536643522634943 × 10^-8 R: 0 #Not great

Current Version:

v3.0.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

martinmodrak commented 4 years ago

So I made a mistake in the initial description - Stirling approximation works reasonably only for some of the problematic cases (description now updated). Searching online, the links I found either say "use lgamma" (which we know is not enough) or "steal code from R", which we can't do as it is GPL. And the idea in R's code is (for me) non-trivial to adapt. The main workhorse in R is a function called lgamma_cor which computes the difference between Stirling's approximation to lgamma and actual lgamma value. They then proceed by computing the Stirling's approximation for lbeta and adding the corrections separately. Unfortunately, the code for lgamma_cor is opaque to me, so I can't reimplement it to avoid licensing issues. It uses something called chebyshev_eval which is probably related to Chebyshev polynomials, which have their boost implementation but I am totally out of my depth here and don't understand the surrounding math at all :-(

martinmodrak commented 4 years ago

Also, if I read the code correctly, Stan now relies on autodiff for derivatives of binomial_coefficient_log (I believe Stan's lchoose delegates to binomial_coefficient_log in math and there is no rev version of the function. But autodiffing through a complex approximation (as the one in R) might be undesirable, so analytic derivatives might need to be developed and tested as well. We do have analytic derivatives for lbeta implementation we have, but I would suspect they would fail where the current implementation fails.