Add squared exponential covariance function for gaussian processes

stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.

https://mc-stan.org

BSD 3-Clause "New" or "Revised" License

736 stars 185 forks source link

Add squared exponential covariance function for gaussian processes #191

Closed syclik closed 8 years ago

syclik commented 8 years ago

Feature Request

Version

2.8.0++

Description

Add a function for the squared exponential covariance function to make GPs easier in Stan. The covariance function: $$ k(x_i, x_j) = \sigma^2 \exp\left(-\frac{(x_i - x_j)^2}{2l^2}\right) $$

gpstuff calls this function gpcf_sexp. gpy calls this function GPy.kern.src.rbf

We should call this something reasonable.

@avehtari + @cunni: are these usually parameterized by the variance (sigma^2) and the length-scale (l)?

Example

We still need a reasonable name for the function, but the usage would look something like:

data {
  int N;
  vector[N] x;
}
parameters {
  real<lower = 0> variance;  // can't really use variance as a name here
  real<lower = 0> lengthscale; 
}
transformed parameters {
  matrix[N, N] cov;
  cov <- squared_exponential(X, variance, lengthscale);
}
...

That look about right?

Additional Information

What should we name this function?

dustinvtran commented 8 years ago

What about ARD kernels? Will they be a separate function, e.g., squared_exponential_ard?

rtrangucci commented 8 years ago

If the length scale argument is vectorized like the arguments are for densities the function could support ARD and a single length scale.

Rob

On Oct 22, 2015, at 12:47 AM, Dustin Tran notifications@github.com wrote:

What about ARD kernels? Will they be a separate function, e.g., squared_exponential_ard?

— Reply to this email directly or view it on GitHub.

avehtari commented 8 years ago

@avehtari https://github.com/avehtari + @cunni https://github.com/cunni: are these usually parameterized by the variance (sigma^2) and the length-scale (l)?

Yes, but some use also l^2. To follow Stan parameterisation of normal, I propose to parameterize by sigma and l

transformed parameters { matrix[N, N] cov; cov <- squared_exponential(X, variance, lengthscale);

Note that to make predictions for new data (with size Nt), we need also covariance matrix NxNt. GPstuff uses covtr when computing the covariance matrix with the trainig data only, ie

covtr <- covtr_sexp(X, variance, lengthscale); and then when making predictions cov <- cov_sexp(X, Xt, variance, lengthscale);

What should we name this function?

My suggestion is covtr_sexp and cov_sexp

sakrejda commented 8 years ago

On Thu, Oct 22, 2015 at 10:50 AM, Aki Vehtari notifications@github.com wrote:

[snip]

covtr <- covtr_sexp(X, variance, lengthscale); and then when making predictions cov <- cov_sexp(X, Xt, variance, lengthscale);

What should we name this function?

My suggestion is covtr_sexp and cov_sexp

I'd advocate against the "sexp" shorthand because it's a data type (the data type?) in R's internals and it's really distracting to un-translate the shorthand every time.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/math/issues/191#issuecomment-150250608.

avehtari commented 8 years ago

I should have been more exact that I like that function starts covtr and cov, but instead of sexp something else could be used, too.

betanalpha commented 8 years ago

sq_exp_cov

On Oct 22, 2015, at 3:58 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Thu, Oct 22, 2015 at 10:50 AM, Aki Vehtari notifications@github.com wrote:

[snip]

covtr <- covtr_sexp(X, variance, lengthscale); and then when making predictions cov <- cov_sexp(X, Xt, variance, lengthscale);

What should we name this function?

My suggestion is covtr_sexp and cov_sexp

I'd advocate against the "sexp" shorthand because it's a data type (the data type?) in R's internals and it's really distracting to un-translate the shorthand every time.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/math/issues/191#issuecomment-150250608.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

We may very well want to implement other kernels at some point in the future, in which case they should start with a common prefix so that they appear next to each other in the index. Maybe cov_sq_exp?

bob-carpenter commented 8 years ago

OK by me. I don't know how much it matters to have proximity in the index of function names. Of course, we really need a plain old topic index as well as a function index.

On Oct 24, 2015, at 6:36 PM, bgoodri notifications@github.com wrote:

We may very well want to implement other kernels at some point in the future, in which case they should start with a common prefix so that they appear next to each other in the index. Maybe cov_sq_exp?

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

If we ever get function name completion in the editors, then having similar functions start with the same letters would be a good thing.

On Mon, Oct 26, 2015 at 1:46 AM, Bob Carpenter notifications@github.com wrote:

OK by me. I don't know how much it matters to have proximity in the index of function names. Of course, we really need a plain old topic index as well as a function index.

On Oct 24, 2015, at 6:36 PM, bgoodri notifications@github.com wrote:

We may very well want to implement other kernels at some point in the future, in which case they should start with a common prefix so that they appear next to each other in the index. Maybe cov_sq_exp?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/math/issues/191#issuecomment-151031295.

syclik commented 8 years ago

I want some help with the specification of the function. I've implemented one of these, but I want to know if the others should be implemented as well. Note: the types used are Stan types.

matrix cov_sq_exp(matrix X, real sigma, real l) -- this is the one I implemented where I assumed each row was a separate
matrix cov_sq_exp(vector X[], real sigma, real l) -- this one seems pretty natural
matrix cov_sq_exp(row_vector X[], real sigma, real l) -- I'd rather not implement this if I didn't need to
matrix cov_sq_exp(real X[], real sigma, real l)
matrix cov_sq_exp(matrix X[], real sigma, real l) -- this seems like a natural extension?

Is the order ok? That's X, then sigma, then l?

We have another set of functions that need to be written that includes two Xs. If it's natural to write the arguments in another order, I'll assume the Xs will be placed together.

Also, when we have two Xs, I'm going to only handle things with the same structure.

betanalpha commented 8 years ago

Ultimately we’re going to want functions that take in the hyperparameters and return the complete GP log density, right? Are these orthogonal to such a goal?

On Oct 26, 2015, at 4:49 PM, Daniel Lee notifications@github.com wrote:

I want some help with the specification of the function. I've implemented one of these, but I want to know if the others should be implemented as well. Note: the types used are Stan types.

matrix cov_sq_exp(matrix X, real sigma, real l) -- this is the one I implemented where I assumed each row was a separate matrix cov_sq_exp(vector X[], real sigma, real l) -- this one seems pretty natural matrix cov_sq_exp(row_vector X[], real sigma, real l) -- I'd rather not implement this if I didn't need to matrix cov_sq_exp(real X[], real sigma, real l) matrix cov_sq_exp(matrix X[], real sigma, real l) -- this seems like a natural extension? Is the order ok? That's X, then sigma, then l?

We have another set of functions that need to be written that includes two Xs. If it's natural to write the arguments in another order, I'll assume the Xs will be placed together.

Also, when we have two Xs, I'm going to only handle things with the same structure.

— Reply to this email directly or view it on GitHub.

syclik commented 8 years ago

Yes. These are orthogonal to this issue, but that's where we're eventually going with this.

On Mon, Oct 26, 2015 at 1:09 PM, Michael Betancourt < notifications@github.com> wrote:

Ultimately we’re going to want functions that take in the hyperparameters and return the complete GP log density, right? Are these orthogonal to such a goal?

On Oct 26, 2015, at 4:49 PM, Daniel Lee notifications@github.com wrote:

I want some help with the specification of the function. I've implemented one of these, but I want to know if the others should be implemented as well. Note: the types used are Stan types.

matrix cov_sq_exp(matrix X, real sigma, real l) -- this is the one I implemented where I assumed each row was a separate matrix cov_sq_exp(vector X[], real sigma, real l) -- this one seems pretty natural matrix cov_sq_exp(row_vector X[], real sigma, real l) -- I'd rather not implement this if I didn't need to matrix cov_sq_exp(real X[], real sigma, real l) matrix cov_sq_exp(matrix X[], real sigma, real l) -- this seems like a natural extension? Is the order ok? That's X, then sigma, then l?

We have another set of functions that need to be written that includes two Xs. If it's natural to write the arguments in another order, I'll assume the Xs will be placed together.

Also, when we have two Xs, I'm going to only handle things with the same structure.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/math/issues/191#issuecomment-151211989.

rtrangucci commented 8 years ago

Do we want to implement the vectorized version?

matrix cov_sq_exp(matrix X, real sigma, vector l[])

bob-carpenter commented 8 years ago

This is an important point. Will we ever need this function if we have the full GP density. Will it help to put them together or will the factored version be enough?

What would each look like from a user perspective?

On Oct 26, 2015, at 1:09 PM, Michael Betancourt notifications@github.com wrote:

Ultimately we’re going to want functions that take in the hyperparameters and return the complete GP log density, right? Are these orthogonal to such a goal?

On Oct 26, 2015, at 4:49 PM, Daniel Lee notifications@github.com wrote:

I want some help with the specification of the function. I've implemented one of these, but I want to know if the others should be implemented as well. Note: the types used are Stan types.

matrix cov_sq_exp(matrix X, real sigma, real l) -- this is the one I implemented where I assumed each row was a separate matrix cov_sq_exp(vector X[], real sigma, real l) -- this one seems pretty natural matrix cov_sq_exp(row_vector X[], real sigma, real l) -- I'd rather not implement this if I didn't need to matrix cov_sq_exp(real X[], real sigma, real l) matrix cov_sq_exp(matrix X[], real sigma, real l) -- this seems like a natural extension? Is the order ok? That's X, then sigma, then l?

We have another set of functions that need to be written that includes two Xs. If it's natural to write the arguments in another order, I'll assume the Xs will be placed together.

Also, when we have two Xs, I'm going to only handle things with the same structure.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

mbrubake commented 8 years ago

Generic kernel functions could be useful for people beyond just in GPs. E.G., if they were doing kernel methods with general elliptical distributions. Also, I'm hard pressed to imagine that we'll want to compound the GP density with the kernel function because there are so many choices of kernels (and ways to combine them) that a GP density which prescribes a kernel function will be artificially limiting.

Also, as was mentioned somewhere else, lots of people do ARD with GPs, so matrix cov_sq_exp(..., vector l) is important and not exactly "vectorization" in the sense that we typically think of it.

cunni commented 8 years ago

just catching up. a few points:

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.
kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.
i suggest making cov_sq_exp accept a vector \ell as has been discussed for ARD with GP. It's a common use case. \ell should be the size of the input dimensionality of X.

john

On Mon, Oct 26, 2015 at 3:07 PM, Marcus Brubaker notifications@github.com wrote:

Generic kernel functions could be useful for people beyond just in GPs. E.G., if they were doing kernel methods with general elliptical distributions. Also, I'm hard pressed to imagine that we'll want to compound the GP density with the kernel function because there are so many choices of kernels (and ways to combine them) that a GP density which prescribes a kernel function will be artificially limiting.

Also, as was mentioned somewhere else, lots of people do ARD with GPs, so matrix cov_sq_exp(..., vector l) is important and not exactly "vectorization" in the sense that we typically think of it.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/math/issues/191#issuecomment-151252347.

bob-carpenter commented 8 years ago

On Oct 26, 2015, at 5:34 PM, cunni notifications@github.com wrote:

just catching up. a few points:

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.

How should they be priortized? I assume cov_sq_exp should be first.

After doing one, we can try to recruit help to do more.

kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.

OK --- sounds like a consensus to me. And much much easier to implement and doc.

i suggest making cov_sq_exp accept a vector \ell as has been discussed for ARD with GP. It's a common use case. \ell should be the size of the input dimensionality of X.

That should be easy --- we'll just need to broadcast a scalar in that position. That is, if we provide a scalar l, then it acts as a vector consisting of K copies of l, where l is the input dims.

avehtari commented 8 years ago

I guess that after squared exponential, the other covariance functions in the birthday example would be nice :)

syclik commented 8 years ago

The priority right now is for cov_sq_exp without ard. I spoke to Aki about this and I think to calculate the derivatives for the version with ard, we'll have to be clever about memory usage.

I think we should tackle this one and see what sort of speed gains we see. We can handle the general vector \ell case as a second pass.

We can also recruit volunteers for the next set of covariance functions.

One other thing Aki and I briefly discussed was having a version that returns the cholesky factor of the covariance kernel. That's another thing we can do next.

betanalpha commented 8 years ago

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.

Covariance functions are going to be a rabbit hole because we also have to consider their algebraic structure — i.e., we can add and multiply covariance functions to generate new covariance functions. Given the utility of adding a bunch of covariance functions together, this isn’t a pedantic point. The problem is in implementation, and I’m not sure how we’d do it outside of expression templates or something of that sort.

kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.

But the Stan modeling language is probabilistic in nature, and the only decent measures floating around are GPs. So what kernel method is not going to end with a multivariate gaussian or a related marginal or conditional?

betanalpha commented 8 years ago

Here’s my proposal as for the birthday problem as discussed at the meeting. Note that I left out the 5th, 6th, and 7th kernels as they do not seem to induce a positive-definite covariance so I must not be understanding correctly.

I didn’t bother to be careful about length scales and noises and their squares.

function { gp_cov(real x1[], real x2[], real theta[]) { real sigma1; real l1; real sigma2; real l2; real sigma3; real l31; real l32; real sigma4; real l41; real l42; real sigma5; real l5; real sigma6; real l6; real epsilon;

// My kingdom for a declare and initialize outside of the parameters block!
sigma1 <- theta[1];
l1 <- theta[2];
sigma2 <- theta[3];
l2 <- theta[4];
sigma3 <- theta[5];
l31 <- theta[6];
l32 <- theta[7];
sigma4 <- theta[8];
l41 <- theta[9];
l42 <- theta[10];
sigma5 <- theta[11];
l5 <- theta[12];
sigma6 <- theta[13];
l6 <- theta[14];
epsilon <- theta[15];

return   sigma1 * cov_sq_exp(x1, x2, l1)
       + sigma2 * cov_sq_exp(x1, x2, l2) 
       + sigma3 * cov_sin_exp(x1, x2, pi() / 7, l31) * cov_sq_exp(x1, x2, l32)
       + sigma4 * cov_sin_exp(x1, x2, pi() / 365, l41) * cov_sq_exp(x1, x2, l42)
       + epsilon * cov_delta(x1, x2);

} }

data { int n_train; int n_dim; // for the birthday problem n_dim = 1

matrix[n_data, n_dim] x; // this corresponds to time for the birthday problem vector[n_train] y; // this corresponds to births

int n_test; matrix[n_test, n_dim] x_tilde; }

transformed data { int n_hyper; n_hyper=15; }

parameters { real theta[n_hyper]; }

model { theta ~ cauchy(0, 2.5); y ~ gp(gp_cov, x, theta); }

generated quantities { real y[n_test]; y ~ gp_rng(gp_cov, x_test, theta); }

On Oct 27, 2015, at 9:15 AM, Michael Betancourt betanalpha@gmail.com wrote:

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.

Covariance functions are going to be a rabbit hole because we also have to consider their algebraic structure — i.e., we can add and multiply covariance functions to generate new covariance functions. Given the utility of adding a bunch of covariance functions together, this isn’t a pedantic point. The problem is in implementation, and I’m not sure how we’d do it outside of expression templates or something of that sort.

kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.

But the Stan modeling language is probabilistic in nature, and the only decent measures floating around are GPs. So what kernel method is not going to end with a multivariate gaussian or a related marginal or conditional?

bob-carpenter commented 8 years ago

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

A compound declare-define would be good. The only issue is how to mark which variables are saved.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

Does the gp_rng also need the basic x as well as the x_test?

On Oct 27, 2015, at 1:05 PM, Michael Betancourt notifications@github.com wrote:

Here’s my proposal as for the birthday problem as discussed at the meeting. Note that I left out the 5th, 6th, and 7th kernels as they do not seem to induce a positive-definite covariance so I must not be understanding correctly.

I didn’t bother to be careful about length scales and noises and their squares.

function { gp_cov(real x1[], real x2[], real theta[]) { real sigma1; real l1; real sigma2; real l2; real sigma3; real l31; real l32; real sigma4; real l41; real l42; real sigma5; real l5; real sigma6; real l6; real epsilon;

// My kingdom for a declare and initialize outside of the parameters block! sigma1 <- theta[1]; l1 <- theta[2]; sigma2 <- theta[3]; l2 <- theta[4]; sigma3 <- theta[5]; l31 <- theta[6]; l32 <- theta[7]; sigma4 <- theta[8]; l41 <- theta[9]; l42 <- theta[10]; sigma5 <- theta[11]; l5 <- theta[12]; sigma6 <- theta[13]; l6 <- theta[14]; epsilon <- theta[15];

return sigma1 * cov_sq_exp(x1, x2, l1)

sigma2 * cov_sq_exp(x1, x2, l2)

sigma3 * cov_sin_exp(x1, x2, pi() / 7, l31) * cov_sq_exp(x1, x2, l32)

sigma4 * cov_sin_exp(x1, x2, pi() / 365, l41) * cov_sq_exp(x1, x2, l42)

epsilon * cov_delta(x1, x2); } }

data { int n_train; int n_dim; // for the birthday problem n_dim = 1

matrix[n_data, n_dim] x; // this corresponds to time for the birthday problem vector[n_train] y; // this corresponds to births

int n_test; matrix[n_test, n_dim] x_tilde; }

transformed data { int n_hyper; n_hyper=15; }

parameters { real theta[n_hyper]; }

model { theta ~ cauchy(0, 2.5); y ~ gp(gp_cov, x, theta); }

generated quantities { real y[n_test]; y ~ gp_rng(gp_cov, x_test, theta); }

On Oct 27, 2015, at 9:15 AM, Michael Betancourt betanalpha@gmail.com wrote:

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.

Covariance functions are going to be a rabbit hole because we also have to consider their algebraic structure — i.e., we can add and multiply covariance functions to generate new covariance functions. Given the utility of adding a bunch of covariance functions together, this isn’t a pedantic point. The problem is in implementation, and I’m not sure how we’d do it outside of expression templates or something of that sort.

kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.

But the Stan modeling language is probabilistic in nature, and the only decent measures floating around are GPs. So what kernel method is not going to end with a multivariate gaussian or a related marginal or conditional?

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 8 years ago

I don't see why ARD (aka hierarchical modeling) is an issue. It just means you don't share that 1 / ell^2 term across dimensions.

On Oct 27, 2015, at 2:41 AM, Daniel Lee notifications@github.com wrote:

The priority right now is for cov_sq_exp without ard. I spoke to Aki about this and I think to calculate the derivatives for the version with ard, we'll have to be clever about memory usage.

I think we should tackle this one and see what sort of speed gains we see. We can handle the general vector \ell case as a second pass.

We can also recruit volunteers for the next set of covariance functions.

One other thing Aki and I briefly discussed was having a version that returns the cholesky factor of the covariance kernel. That's another thing we can do next.

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

bob-carpenter commented 8 years ago

Thanks. The birthday problem's a bit simpler than many we'll get because there's only a single length scale. Below is what I came up with using the full matrix approach, though I'm sure I botched the generated quantities --- I still don't really understand GPs very well, but am too lazy to figure out what this really needs to be. Don't we need a covariance matrix involving both x and x_tilde? And then we only want to generate y_tilde? So what I wrote can't be right. But then neither can Michael's because he's redeclaring the data variable y in a way I couldn't figure out.

P.S. Using four binary sums leads to four expressions per entry, whereas it'd be better to just have a sum(). Michael's approach has the same issue with sum not being efficient as a chain compared to the vectorized sum because of the additional subexpressions generated. On the other hand, given everything else, it may not even be noticeable.

functions { matrix cov_fun(vector[] x, real sigma, real l) { return cov_sq_exp(x, l[1], sigma[1])

cov_sq_exp(x, l[2], sigma[2])
cov_sin_exp(x, pi() / 7, l[3]) .* cov_sq_exp(x1, x2, l[4], sigma[3])
cov_sin_exp(x, pi() / 365, l[5]) .* cov_sq_exp(x, l[6], sigma[4])
cov_delta(x); } } data { int n_train; int n_dim; // for the birthday problem n_dim = 1 matrix[n_data, n_dim] x; // this corresponds to time for the birthday problem vector[n_train] y; // this corresponds to births int n_test; matrix[n_test, n_dim] x_tilde; } parameters { vector sigma[4]; vector l[6]; } model { theta ~ cauchy(0, 2.5); y ~ multi_normal(rep_vector(0, n_train), cov_fun(x, sigma, l)); } generated quantities { real y_tilde[n_test]; y_tilde <- multi_normal_rng(rep_vector(0, n_train + n_test), cov_fun(append_rows(x, x_tilde), sigma, l)); }

On Oct 27, 2015, at 1:05 PM, Michael Betancourt notifications@github.com wrote:

Here’s my proposal as for the birthday problem as discussed at the meeting. Note that I left out the 5th, 6th, and 7th kernels as they do not seem to induce a positive-definite covariance so I must not be understanding correctly.

I didn’t bother to be careful about length scales and noises and their squares.

function { gp_cov(real x1[], real x2[], real theta[]) { real sigma1; real l1; real sigma2; real l2; real sigma3; real l31; real l32; real sigma4; real l41; real l42; real sigma5; real l5; real sigma6; real l6; real epsilon;

// My kingdom for a declare and initialize outside of the parameters block! sigma1 <- theta[1]; l1 <- theta[2]; sigma2 <- theta[3]; l2 <- theta[4]; sigma3 <- theta[5]; l31 <- theta[6]; l32 <- theta[7]; sigma4 <- theta[8]; l41 <- theta[9]; l42 <- theta[10]; sigma5 <- theta[11]; l5 <- theta[12]; sigma6 <- theta[13]; l6 <- theta[14]; epsilon <- theta[15];

return sigma1 * cov_sq_exp(x1, x2, l1)

sigma2 * cov_sq_exp(x1, x2, l2)

sigma3 * cov_sin_exp(x1, x2, pi() / 7, l31) * cov_sq_exp(x1, x2, l32)

sigma4 * cov_sin_exp(x1, x2, pi() / 365, l41) * cov_sq_exp(x1, x2, l42)

epsilon * cov_delta(x1, x2); } }

data { int n_train; int n_dim; // for the birthday problem n_dim = 1

matrix[n_data, n_dim] x; // this corresponds to time for the birthday problem vector[n_train] y; // this corresponds to births

int n_test; matrix[n_test, n_dim] x_tilde; }

transformed data { int n_hyper; n_hyper=15; }

parameters { real theta[n_hyper]; }

model { theta ~ cauchy(0, 2.5); y ~ gp(gp_cov, x, theta); }

generated quantities { real y[n_test]; y ~ gp_rng(gp_cov, x_test, theta); }

On Oct 27, 2015, at 9:15 AM, Michael Betancourt betanalpha@gmail.com wrote:

all cov functions should start cov_ . See http://www.gaussianprocess.org/gpml/code/matlab/cov/ for a list of covariance functions. One can not really hope to cover the space, so as extensible as possible is a sensible goal.

Covariance functions are going to be a rabbit hole because we also have to consider their algebraic structure — i.e., we can add and multiply covariance functions to generate new covariance functions. Given the utility of adding a bunch of covariance functions together, this isn’t a pedantic point. The problem is in implementation, and I’m not sure how we’d do it outside of expression templates or something of that sort.

kernels will likely be desired much more often than gp (kernel methods are a big class), so I favor accessible kernel functions, not just inside the GP likelihood.

But the Stan modeling language is probabilistic in nature, and the only decent measures floating around are GPs. So what kernel method is not going to end with a multivariate gaussian or a related marginal or conditional?

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 8 years ago

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

Don't we need a covariance matrix involving both x and x_tilde? And then we only want to generate y_tilde? So what I wrote can't be right. But then neither can Michael's because he's redeclaring the data variable y in a way I couldn't figure out.

No, that’s right. You need an (n_train x n_train) covariance matrix to learn the hyperparameters and then just a (n_train x n_test) part of the ( (n_train + n_test) x (n_train + n_test) ) joint covariance matrix to sample the y_tilde arising from the x_tilde.

betanalpha commented 8 years ago

Right, so I would say that declarations should still be limited to the beginning of a scope. I can see how this would be limiting but I'm just interested in allowing all of the quick one-line definitions that we're missing out on now.

On Oct 27, 2015, at 5:38 PM, Bob Carpenter notifications@github.com wrote:

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

Bob

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 8 years ago

So the first one (y1) is a generated quantity but not y2 because there's an assignment to y1 in between.

That would be easiest to implement, but I feared it would be very confusing to users.

On Oct 27, 2015, at 2:07 PM, Michael Betancourt notifications@github.com wrote:

Right, so I would say that declarations should still be limited to the beginning of a scope. I can see how this would be limiting but I'm just interested in allowing all of the quick one-line definitions that we're missing out on now.

On Oct 27, 2015, at 5:38 PM, Bob Carpenter notifications@github.com wrote:

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

Bob

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

Sure, understandable. Again, if we limited such inline definitions to transformed data, model, and local scopes then this wouldn’t be an issue.

On Oct 27, 2015, at 9:57 PM, Bob Carpenter notifications@github.com wrote:

So the first one (y1) is a generated quantity but not y2 because there's an assignment to y1 in between.

That would be easiest to implement, but I feared it would be very confusing to users.

Bob

On Oct 27, 2015, at 2:07 PM, Michael Betancourt notifications@github.com wrote:

Right, so I would say that declarations should still be limited to the beginning of a scope. I can see how this would be limiting but I'm just interested in allowing all of the quick one-line definitions that we're missing out on now.

On Oct 27, 2015, at 5:38 PM, Bob Carpenter notifications@github.com wrote:

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

Bob

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 8 years ago

I don't understand what you think the behavior should be if the user tries this:

transformed data {
  real y1 <- normal_rng(0, 1);
  y1 <- sigma * y1 + mu;
  real y2 <- normal_rng(0, 1);
}

The following options seem reasonable:

disallow it, because there's a declaration after the initial list of declarations, or
allow it, and

a. treat the y2 as a local variable because it's after the initial run of declarations, or

b. treat the y2 as transformed data.
- Bob

On Oct 27, 2015, at 6:07 PM, Michael Betancourt notifications@github.com wrote:

Sure, understandable. Again, if we limited such inline definitions to transformed data, model, and local scopes then this wouldn’t be an issue.

On Oct 27, 2015, at 9:57 PM, Bob Carpenter notifications@github.com wrote:

So the first one (y1) is a generated quantity but not y2 because there's an assignment to y1 in between.

That would be easiest to implement, but I feared it would be very confusing to users.

Bob

On Oct 27, 2015, at 2:07 PM, Michael Betancourt notifications@github.com wrote:

Right, so I would say that declarations should still be limited to the beginning of a scope. I can see how this would be limiting but I'm just interested in allowing all of the quick one-line definitions that we're missing out on now.

On Oct 27, 2015, at 5:38 PM, Bob Carpenter notifications@github.com wrote:

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

Bob

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

Gah, you’re right. My initial instinct is to say (1) but I can see how that would be confusing, especially to people familiar with coding.

Also, this should probably be its own thread. There’s an open issue for this in the stan repo.

On Oct 27, 2015, at 10:12 PM, Bob Carpenter notifications@github.com wrote:

I don't understand what you think the behavior should be if the user tries this:
transformed data {
real y1 <- normal_rng(0, 1);
y1 <- sigma * y1 + mu;
real y2 <- normal_rng(0, 1);
}
The following options seem reasonable:

disallow it, because there's a declaration after the initial list of declarations, or

allow it, and

a. treat the y2 as a local variable because it's after the initial run of declarations, or

b. treat the y2 as transformed data.

Bob

On Oct 27, 2015, at 6:07 PM, Michael Betancourt notifications@github.com wrote:

Sure, understandable. Again, if we limited such inline definitions to transformed data, model, and local scopes then this wouldn’t be an issue.

On Oct 27, 2015, at 9:57 PM, Bob Carpenter notifications@github.com wrote:

So the first one (y1) is a generated quantity but not y2 because there's an assignment to y1 in between.

That would be easiest to implement, but I feared it would be very confusing to users.

Bob

On Oct 27, 2015, at 2:07 PM, Michael Betancourt notifications@github.com wrote:

Right, so I would say that declarations should still be limited to the beginning of a scope. I can see how this would be limiting but I'm just interested in allowing all of the quick one-line definitions that we're missing out on now.

On Oct 27, 2015, at 5:38 PM, Bob Carpenter notifications@github.com wrote:

Some encoding's getting messed up here.

The confusing case I have in mind is this one:

generated quantities { real y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; real y2 <- normal_rng(0, 1); }

Is y2 saved as a generated quantity? That is, do we save anything declared in the top-level scope in generated quantities?

What happens now is that there is an initial stream of declarations, then declarations aren't allowed after that. So the above would be

generated quantities { real y1; real y2; y1 <- normal_rng(0, 1); y1 <- sigma * y1 + mu; y2 <- normal_rng(0, 1);

// real y3; // currently illegal }

which is not ambiguous and makes it easy to scan for which variables are saved in a block.

Bob

On Oct 27, 2015, at 1:24 PM, Michael Betancourt notifications@github.com wrote:

Thanks. That looks about how I thought it would look. Don't we want x1 and x2 to be vectors?

Vectors or arrays. Arrays were just the first thing that came to mind.

A compound declare-define would be good. The only issue is how to mark which variables are saved.

You mean in generated quantities? I would be completely happy if declare-defines were limited to places where there was no ambiguity � transformed data, model, and local scopes.

I like that it uses the gp function the way Aki defined the GP distribution (is it even called that?) in BDA3?

It�s a "covariance function� � you could call it a GP covariance function if you wanted � which marginalizes down to a covariance matrix.

Does the gp_rng also need the basic x as well as the x_test?

Yes, you are correct it would need both.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.