stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.6k stars 370 forks source link

bivariate normal cdf #2356

Open bob-carpenter opened 7 years ago

bob-carpenter commented 7 years ago

Summary:

Add a function to compute the bivariate normal cdf.

Description:

For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute

bivar_normal_cdf(y, mu, Sigma)

  = INT_{u < y}  multinormal_pdf(u, mu, Sigma) d.u

where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter.

Additional Info:

This paper is often cited for an algorithm:

Current Version:

v2.16.0

bgoodri commented 7 years ago

If someone is going to implement the CDF, you might as well implement the PMF and other functions with scalar inputs rather than 2x2 matrices and 2-vectors.

On Fri, Jul 14, 2017 at 11:41 AM, Bob Carpenter notifications@github.com wrote:

Summary:

Add a function to compute the bivariate normal cdf. Description:

For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute

bivar_normal_cdf(y, mu, Sigma)

= INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u

where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. Additional Info:

This paper is often cited for an algorithm:

  • Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755

Current Version:

v2.16.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqq1fgvojoTBP3dz6Tq0Y6sQ4f3-4ks5sN4w1gaJpZM4OYaQb .

bgoodri commented 7 years ago

This article seems to have reversed the ratios in the numerators for the calculation of a1 and a2. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it

  real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }
bgoodri commented 7 years ago

There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.

bgoodri commented 7 years ago

For completeness, I perhaps should have added that if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

khakieconomics commented 7 years ago

This is extremely handy. Thanks @bgoodri

rtrangucci commented 6 years ago

@bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors:

If we're set on 2-vectors, I'd like to expose the std_binormal_integral(real y1, real y2, real rho) as a function that takes scalars. The usage pattern for the std_binormal_lcdf function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the std_binormal_lcdf in something like a bivariate probit, where [y1, y2]' is a vector of vars. So the Stan code would need to do something like:

data {
  int N;
  int K1;
  int K2;
  matrix[N, K1] X1;
  matrix[N, K1] X2;
}
parameters {
  real<lower=-1, upper=1> rho;
  vector[K1] beta1;
  vector[K2] beta2;
}
model {
  vector[N] mu1 = X1 * beta1;
  vector[N] mu2 = X2 * beta2;
  for (n in 1:N)
    target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho);
  ...

where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the rho value that is repeated for every observation), we'd need to instead code the model like:

model {
...
  vector[2] model_mu[N];
  for (n in 1:N) {
    model_mu[n] = [mu1[n], mu2[n]]';
  }
  target += std_binormal_lcdf(model_mu | rho);
  ...

This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in mu1 and mu2 and we need an extra loop. Here's the way it would work with the way I've coded it now:

model {
...
  target += std_binormal_lcdf(mu1 | mu2, rho);
  ...

That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like

model {
  matrix[N, 2] mu;
  mu[,1] = X1 * beta1;
  mu[,2] = X2 * beta2;
  target += std_binormal_lcdf(mu | rho);
  ...

That I could get behind, and it would perhaps speed up internal computations.

bob-carpenter commented 6 years ago

Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

bgoodri commented 6 years ago

matrix with 2 columns?

On Wed, Aug 22, 2018 at 10:26 AM Bob Carpenter notifications@github.com wrote:

Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356#issuecomment-415050860, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqqeJxccAOgVVsAMZ66h85-HtPJh8ks5uTWoUgaJpZM4OYaQb .

rtrangucci commented 6 years ago

Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

are you proposing we add a function to_array_of_2vectors(reals y1, reals y2) to the language? Won't that be the same internally as the Stan code:

vector[2] mu[N];
for (n in 1:N)
  mu[n] = [mu1[n], mu2[n]]';

?

If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for to_array_of_2vectors.

rtrangucci commented 6 years ago

One further point, then I'll go work on the function. All of the math internal to the std_binormal_lpdf and std_binormal_lcdf function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.

bob-carpenter commented 6 years ago

Yes. I'm proposing adding:

vector[] to_array_of_2vectors(vector x, vector y);

where the two vectors have the same size and

to_array_of_2vectors(x, y)[i, 1] == x[i]
to_array_of_2vectors(x, y)[i, 2] == y[i]

But it's not very elegant, to say the least.

I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be

append_col(x, y) ~ ...

It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.

bob-carpenter commented 6 years ago

I'm only concerned about the external client interface, not how it's implemented.

But having said that, if it's all scalars under the hood, how will vectorization help?

I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.

rtrangucci commented 6 years ago

Both std_binormal_lpdf and std_binormal_lcdf are implemented nearly identically to normal_lpdf: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp

bob-carpenter commented 6 years ago

I take it you meant binormal and not normal in those first two!

rtrangucci commented 6 years ago

Yes, you're right! Good catch, and I've edited the comment.

Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.