stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
728 stars 183 forks source link

sample mean, covariance, and correlation functions #75

Open syclik opened 9 years ago

syclik commented 9 years ago

From @bob-carpenter on February 7, 2015 0:36

It'd be nice to have sample covariance and correlation functions.

matrix[K,K] cov(matrix[N,K] x);

matrix[K,K] corr(matrix[N,K] x);

along with a matching multivariate mean

vector[K] mean(matrix[N,K] x);

Should we have the following three signatures or maybe just the first two?

matrix[K,K] cov(vector[K] x[N]);  
matrix[K,K] cov(row_vector[K] x[N]);  
matrix[K,K] cov(real x[N,K]);

and

vector[K] mean(vector[K] x[N]);

I don't think there should be a mean returning a row vector --- I think it'd be confusing.

I'd suggest a simple implementation in stan/math based on Eigen::Matrix that we'd just autodiff through for var and fvar:

template <typename T>
Matrix<T,Dynamic,Dynamic> cov(const Matrix<T,Dynamic,Dynamic>& x);

template <typename T>
Matrix<T,Dynamic,Dynamic> cov(const std::vector<Matrix<T,Dynamic,1> >& x);

template <typename T>
Matrix<T,Dynamic,Dynamic> cov(const std::vector<Matrix<T,1,Dynamic> >& x);

The last two implementations could probably be combined, but I don't see how to combine the first one with a matrix input.

It'd be a bonus if the implementation of the function could be shared with the ones using the Welford algorithm in the chains analysis.

Another bonus would be not creating more vari than necessary in the autodiff implementations by taking advantage of the symmetry in the result.

Copied from original issue: stan-dev/stan#1284

syclik commented 9 years ago

From @andrewgelman on February 7, 2015 2:59

Hi, just one thing. We currently have mean() and var(), so for parallelism these should just be cov() and corr(). A

On Feb 6, 2015, at 7:36 PM, Bob Carpenter notifications@github.com wrote:

It'd be nice to have sample covariance and correlation functions.

matrix[K,K] sample_cov(matrix[N,K] x);

matrix[K,K] sample_corr(matrix[N,K] x); Should we have the following three signatures or maybe just the first two?

matrix[K,K] sample_cov(vector[K] x[N]);
matrix[K,K] sample_cov(row_vector[K] x[N]);
matrix[K,K] sample_cov(real x[N,K]); I'd suggest a simple implementation in stan/math based on Eigen::Matrix that we'd just autodiff through for var and fvar:

template Matrix<T,Dynamic,Dynamic> sample_cov(const Matrix<T,Dynamic,Dynamic>& x);

template Matrix<T,Dynamic,Dynamic> sample_cov(const std::vector<Matrix<T,Dynamic,1> >& x);

template Matrix<T,Dynamic,Dynamic> sample_cov(const std::vector<Matrix<T,1,Dynamic> >& x); The last two implementations could probably be combined, but I don't see how to combine the first one with a matrix input.

It'd be a bonus if the implementation of the function could be shared with the ones using the Welford algorithm in the chains analysis.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/1284.

syclik commented 9 years ago

From @lionel- on April 2, 2015 11:48

could this be overloaded to accept a correlation matrix and a vector of sd as arguments? In this case, it would be equivalent to diag_pre_multiply(). And then could be used in statements such as:

Omega ~ lkj_corr(2);
sigma ~ cauchy(0, 5);
y ~ multi_normal(mu, cov(Omega, sigma));

It could also be overloaded for a diagonal matrix of sd and a cholesky factor type, so we'd have:

L ~ lkj_corr_cholesky(2);
sigma ~ cauchy(0, 5);
y ~ multi_normal(mu, cov(L, sigma));

I think those linear algebra statements such as diag_pre_multiply() get in the way of seeing the statistical ideas in the code.

syclik commented 9 years ago

From @bob-carpenter on April 2, 2015 12:20

The general rule for overloads in programming is that the functions should all do the same thing. The existing cov() computes sample covariance, whereas in cov(Omega,sigma) it'd be doing a diagonal pre-multiply.

How about a three-argument parameterization of multi_normal, as in multi_normal_corr_scale(mu, Omega, sigma) or multi_normal_corr_sd(...)? That follows the usual pattern for overloads. It'd require someone to code it and test it. Or for now, it could just call diag_pre_multiply() internally.

On Apr 2, 2015, at 10:48 PM, Lionel Henry notifications@github.com wrote:

could this be overloaded to accept a correlation matrix and a vector of sd as arguments? In this case, it would be equivalent to diag_pre_multiply(). And then could be used in statements such as:

Omega ~ lkj_corr(2); sigma ~ cauchy(0, 5); y ~ multi_normal(mu, cov(Omega, sigma));

It could also be overloaded for a diagonal matrix of sd and a cholesky factor type, so we'd have:

L ~ lkj_corr_cholesky(2); sigma ~ cauchy(0, 5); y ~ multi_normal(mu, cov(L, sigma));

I think those linear algebra statements such as diag_pre_multiply() get in the way of seeing the statistical ideas in the code.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @lionel- on April 2, 2015 12:40

hmm the way I see it, they all do the same thing: computing a covariance matrix. The difference is that they compute it based on different kind of inputs. The cov(Omega, sigma) overload computes it based on summaries of a sample (or a population, which can be framed as an hypothetical infinite sample).

The overloading of multi_normal() would be very neat. I may give it a try if I find some time.

syclik commented 9 years ago

From @bob-carpenter on April 2, 2015 13:34

On Apr 2, 2015, at 11:40 PM, Lionel Henry notifications@github.com wrote:

hmm the way I see it, they all do the same thing: computing a covariance matrix.

Good point.

The difference is that they compute it based on different kind of inputs. The cov(Omega, sigma) overload computes it based on summaries of a sample (or a population, which can be framed as an hypothetical infinite sample).

Exactly. This seems different enough to me that they should be different functions. I also just don't think cov() applied to a correlation matrix and vector of scales is transparent enough.

I'm a very literal computer scientist!

The overloading of multi_normal() would be very neat. I may give it a try if I find some time.

It'd be pretty easy if you didn't try to compute custom partials and chain rule propagations. One templated function would work with the right traits to promote results.