Open bob-carpenter opened 7 years ago
If someone is going to implement the CDF, you might as well implement the PMF and other functions with scalar inputs rather than 2x2 matrices and 2-vectors.
On Fri, Jul 14, 2017 at 11:41 AM, Bob Carpenter notifications@github.com wrote:
Summary:
Add a function to compute the bivariate normal cdf. Description:
For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute
bivar_normal_cdf(y, mu, Sigma)
= INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u
where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].
There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. Additional Info:
This paper is often cited for an algorithm:
- Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755
Current Version:
v2.16.0
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqq1fgvojoTBP3dz6Tq0Y6sQ4f3-4ks5sN4w1gaJpZM4OYaQb .
This article seems to have reversed the ratios in the numerators for the calculation of a1
and a2
. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it
real binormal_cdf(real z1, real z2, real rho) {
if (z1 != 0 || z2 != 0) {
real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
real a1 = (z2 / z1 - rho) / denom;
real a2 = (z1 / z2 - rho) / denom;
real product = z1 * z2;
real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
}
return 0.25 + asin(rho) / (2 * pi());
}
There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.
For completeness, I perhaps should have added that if rho = 1
, then the bivariate CDF is min(Phi(z1), Phi(z2))
and if rho = -1
, it is Phi(z1) + Phi(z2) - 1
.
This is extremely handy. Thanks @bgoodri
@bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors:
If we're set on 2-vectors, I'd like to expose the std_binormal_integral(real y1, real y2, real rho)
as a function that takes scalars. The usage pattern for the std_binormal_lcdf
function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the std_binormal_lcdf
in something like a bivariate probit, where [y1, y2]'
is a vector of vars. So the Stan code would need to do something like:
data {
int N;
int K1;
int K2;
matrix[N, K1] X1;
matrix[N, K1] X2;
}
parameters {
real<lower=-1, upper=1> rho;
vector[K1] beta1;
vector[K2] beta2;
}
model {
vector[N] mu1 = X1 * beta1;
vector[N] mu2 = X2 * beta2;
for (n in 1:N)
target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho);
...
where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the rho
value that is repeated for every observation), we'd need to instead code the model like:
model {
...
vector[2] model_mu[N];
for (n in 1:N) {
model_mu[n] = [mu1[n], mu2[n]]';
}
target += std_binormal_lcdf(model_mu | rho);
...
This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in mu1
and mu2
and we need an extra loop. Here's the way it would work with the way I've coded it now:
model {
...
target += std_binormal_lcdf(mu1 | mu2, rho);
...
That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like
model {
matrix[N, 2] mu;
mu[,1] = X1 * beta1;
mu[,2] = X2 * beta2;
target += std_binormal_lcdf(mu | rho);
...
That I could get behind, and it would perhaps speed up internal computations.
Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.
Even understanding the problem, I'm still opposed to violating statistical convention with something like this:
std_binormal_lcdf(mu1 | mu2, rho);
I'd be happier with something like this:
std_binormal_lcdf({ mu1, mu2 } | rho);
but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.
We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:
std_binormal_lcdf(append_col(mu1, mu2) | rho);
or it's transpose append_row(mu1', mu2')
.
I could live with this:
std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));
The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.
Or I want to generalize our input syntax to allow
std_binormal_lcdf(mu1, mu2 | rho);
where both mu1
and mu2
are containers. But that's a ton of work.
matrix with 2 columns?
On Wed, Aug 22, 2018 at 10:26 AM Bob Carpenter notifications@github.com wrote:
Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.
Even understanding the problem, I'm still opposed to violating statistical convention with something like this:
std_binormal_lcdf(mu1 | mu2, rho);
I'd be happier with something like this:
std_binormal_lcdf({ mu1, mu2 } | rho);
but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.
We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:
std_binormal_lcdf(append_col(mu1, mu2) | rho);
or it's transpose append_row(mu1', mu2').
I could live with this:
std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));
The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.
Or I want to generalize our input syntax to allow
std_binormal_lcdf(mu1, mu2 | rho);
where both mu1 and mu2 are containers. But that's a ton of work.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2356#issuecomment-415050860, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqqeJxccAOgVVsAMZ66h85-HtPJh8ks5uTWoUgaJpZM4OYaQb .
Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on:
std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));
are you proposing we add a function to_array_of_2vectors(reals y1, reals y2)
to the language? Won't that be the same internally as the Stan code:
vector[2] mu[N];
for (n in 1:N)
mu[n] = [mu1[n], mu2[n]]';
?
If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for to_array_of_2vectors
.
One further point, then I'll go work on the function. All of the math internal to the std_binormal_lpdf
and std_binormal_lcdf
function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.
Yes. I'm proposing adding:
vector[] to_array_of_2vectors(vector x, vector y);
where the two vectors have the same size and
to_array_of_2vectors(x, y)[i, 1] == x[i]
to_array_of_2vectors(x, y)[i, 2] == y[i]
But it's not very elegant, to say the least.
I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be
append_col(x, y) ~ ...
It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.
I'm only concerned about the external client interface, not how it's implemented.
But having said that, if it's all scalars under the hood, how will vectorization help?
I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.
Both std_binormal_lpdf
and std_binormal_lcdf
are implemented nearly identically to normal_lpdf
: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp
I take it you meant binormal
and not normal
in those first two!
Yes, you're right! Good catch, and I've edited the comment.
Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.
Summary:
Add a function to compute the bivariate normal cdf.
Description:
For size 2 vectors
y
andmu
and 2 x 2 covariance matrixSigma
, computewhere
u
is also a 2-vector andu < y
is defined componentwise asu[1] < y[1] && u[2] < y[2]
.There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter.
Additional Info:
This paper is often cited for an algorithm:
Current Version:
v2.16.0