stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
735 stars 185 forks source link

gp_dot_prod_cov allow vector sigma (ard) #1246

Open avehtari opened 5 years ago

avehtari commented 5 years ago

gp_dot_prod_cov currently allows only scalar sigma. it would be useful to allow vector sigma in addition of scalar sigma. This would be useful for making faster inference for sparse models with p>n and scale mixture of normal priors such as horseshoe. Then integration over the linear model weights could be made analytically using gp_dot_prod_cov. Cost of analytic integration scales as O(n^3), but this is fine in many p>>n cases (e.g. I have many examples with n<100). In the horseshoe case, there would still be p local sigmas, but the posterior dimension is almost halved and the shape of the posterior should be also easier, and thus we should be able to see higher effective sample sizes.

As, e.g. gp_exp_quad_cov allows vector lengthscale, it can be used as an example how to add vector sigma and function signature for sigma part would look similar,

bbbales2 commented 4 years ago

What does a vector sigma look like here? Rasmussen and Williams says the dot product kernel is:

k(x1, x2) = sigma^2 + x1^T * x2

How does a sigma that's not a scalar work in there?

Or are we looking for adding another parameter Sigma:

k(x1, x2) = sigma^2 + x1^T Sigma x2 ?

avehtari commented 4 years ago

If x is augmented with column of 1's, the first sigma can be included in Sigma. Now Sigma is scalar. The request is to allow it to be a diagonal matrix, but in practice we don't need to form the matrix Sigma explicitly, but instead just scale each column of x with the corresponding square root of diagonal Sigma[j,j]. The argument would then be vector sigma and sqrt(Sigma[j,j]) is equal to sigma[j]. Did this clarify?

drezap commented 4 years ago

In RW sigma is the intercept. Here, the vector sigma would be a regression coefficient.

So K(x,x’) = alpha_0^2 + sigma^2 x* x’.

We’d have to rename accordingly.

On Thursday, September 19, 2019, Aki Vehtari notifications@github.com wrote:

If x is augmented with column of 1's, the first sigma can be included in Sigma. Now Sigma is scalar. The request is to allow it to be a diagonal matrix, but in practice we don't need to form the matrix Sigma explicitly, but instead just scale each column of x with the corresponding square root of diagonal Sigma[j,j]. The argument would then be vector sigma and sqrt(Sigma[j,j]) is equal to sigma[j]. Did this clarify?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1246?email_source=notifications&email_token=ACY543DU6UUWDLJ7FVBPRO3QKMNRVA5CNFSM4HN5K5VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7CMZFI#issuecomment-532991125, or mute the thread https://github.com/notifications/unsubscribe-auth/ACY543BA7D7M6NQRB2NELATQKMNRVANCNFSM4HN5K5VA .

-- Best,

Andre Zapico

bbbales2 commented 4 years ago

Aah, got it, thanks both

bbbales2 commented 4 years ago

Actually question @avehtari, should I implement this just as:

k(x1, x2) = x1^T diag(sigma^2) x2?

Or should I add another argument to do:

k(x1, x2) = alpha_0 + x1^T diag(sigma^2) x2?

avehtari commented 4 years ago

k(x1, x2) = x1^T diag(sigma^2) x2?

I'm fine with this (just don't explicitly create the diagonal matrix)

betanalpha commented 4 years ago

I guess the question is whether you are implementing the Gram matrix or all of the efficient Cholesky/inner products/determinants calculations. If it’s just the Gram matrix then the pure dot product is fine as the constant can be added to each element afterwards with no problem. If it’s the latter then I think we’d want the more general one.

On Sep 25, 2019, at 1:48 AM, Aki Vehtari notifications@github.com wrote:

k(x1, x2) = x1^T diag(sigma^2) x2?

I'm fine with this (just don't explicitly create the diagonal matrix)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1246?email_source=notifications&email_token=AALU3FSQP65EVMKGOWM4AD3QLMCTXA5CNFSM4HN5K5VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7QZ5SI#issuecomment-534879945, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FU5IGM5KDCHMR7T6YTQLMCTXANCNFSM4HN5K5VA.

bbbales2 commented 4 years ago

@betanalpha I don't have any plans to make this efficient. Just being a code-robot on this one. But we should still get the interface right so it doesn't have to change.

@avehtari this is up to you. We can to a std::vector of sigmas, or we can do a matrix Sigma, or we can do a cholesky factor of Sigma I guess.

bbbales2 commented 4 years ago

(they're all easy enough to implement)

avehtari commented 4 years ago

All _cov things are currently just creating matrices, so we can go with in this case, too. Right now as mentioned in the issue, I'm hoping just the modification that in addition of allowing scalar sigma, vector sigma is allowed. If someone comes with a use case for matrix (cholesky) Sigma, we can add that in other PR.

bbbales2 commented 4 years ago

Just to clarify, when the input is a std::vector of Eigen::VectorXds, I'm only allowing the vector sigma. I'm not allowing a scalar in that case. This is consistent with how the lengthscales are handled in the other cases is my excuse to not code more.

When the input is a std::vector of doubles, I'm going with a scalar sigma.

avehtari commented 4 years ago

I didn't understand the last message. I thougt we can two different function signatures, one with scalar and one with vector. Also I don't well enough the difference between std::vector and Eigen::VectorXds, I just know what is declared in Stan (and it would be nice to actually support both vector and array, as this is something which tricks most often people)

bbbales2 commented 4 years ago

We could try to make this easy like the _lpdf functions (where in a lot of cases vectors and arrays can get swapped around), but to make things consistent then I'd need to also do it to the other gp_*_cov functions. I tried this last week and it wasn't easy so I just bailed.

[edit: removed accidental italics]

avehtari commented 4 years ago

Ok, then do it in a way which is easy for you. Maybe things are easier with stanc3?

bbbales2 commented 4 years ago

Unfortunately I think this is limited by my actual laziness, not stan3 :D. I'll go with @betanalpha's signature since it's more general. Hope that doesn't screw up the performance of whatever you had in mind.

betanalpha commented 4 years ago

Just for context this was the original spec for adding new covariance functions,

https://github.com/stan-dev/stan/wiki/Adding-a-Gaussian-Process-Covariance-Function https://github.com/stan-dev/stan/wiki/Adding-a-Gaussian-Process-Covariance-Function

The idea was to have optimized Cholesky factorizations and all that good stuff for everything.

That spec hasn’t been followed so I don’t want to obstruct this PR, but it would be great if we could go closer to that spec in the future.

On Sep 25, 2019, at 2:29 PM, Ben Bales notifications@github.com wrote:

Unfortunately I think this is limited by my actual laziness, not stan3 :D. I'll go with @betanalpha https://github.com/betanalpha's signature since it's more general. Hope that doesn't screw up the performance of whatever you had in mind.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/1246?email_source=notifications&email_token=AALU3FURXQWMZIPMEN7WAHDQLO32XA5CNFSM4HN5K5VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7TCFCA#issuecomment-535175816, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FWE5XGEKK3NRIHMDQ3QLO32XANCNFSM4HN5K5VA.

avehtari commented 4 years ago

The idea was to have optimized Cholesky factorizations and all that good stuff for everything.

I think we should reconsider that plan. When we discussed that, we forgot how likely it is that people want to use GPs as part of a bigger model and that covariance functions are often combined. Then to get the speedup, that good stuff would need to be implemented for each combination separately. We could decide some subset, (.e.g combinatios of linear model plus expquad/materns), but it would be way much more flexible if we could have covariance functions with functor approach instead. It's still unclear how to implement such functor approach efficiently, so it may take some time, but meanwhile we could decide whether we officially put that original spec on hold?

betanalpha commented 4 years ago

I think we should reconsider that plan. When we discussed that, we forgot how likely it is that people want to use GPs as part of a bigger model and that covariance functions are often combined.

We did not. This was the main decision that had to be made — do we optimize for individual kernels or composite kernels. Because few if any people were using composite kernels at the time we ultimately went with the optimizations for individual kernels. Then to get the speedup, that good stuff would need to be implemented for each combination separately. We could decide some subset, (.e.g combinatios of linear model plus expquad/materns), but it would be way much more flexible if we could have covariance functions with functor approach instead. It's still unclear how to implement such functor approach efficiently, so it may take some time, but meanwhile we could decide whether we officially put that original spec on hold?

Right, we decided that implementing such an algebra of kernels (which would probably require expression templates) was too complicated and there was not yet sufficient user demand to try to figure it out. Hence the decision to focus on optimizing individual kernels.

While composite kernels do have interesting applications I have yet to see them in common use so I’m not sure we are yet ready to consider moving towards a more elaborate system. Happy to consider evidence to the contrary.

spinkney commented 2 years ago

@rok-cesnovar can we close this?

rok-cesnovar commented 2 years ago

I don't think so. The currently supported signatures are:

gp_dot_prod_cov(array[] real, real) => matrix
gp_dot_prod_cov(array[] real, array[] real, real) => matrix
gp_dot_prod_cov(array[] vector, real) => matrix
gp_dot_prod_cov(array[] vector, array[] vector, real) => matrix

so this was never done.