mateuszbaran / CovarianceEstimation.jl

Lightweight robust covariance estimation in Julia
MIT License
42 stars 7 forks source link
covariance covariance-estimation julia statistics
Status Coverage Docs
CI codecov.io

CovarianceEstimation.jl

Lightweight robust covariance estimation in Julia i.e. if you have a data matrix X of size n×p corresponding to n observations with p features, this package will help you to obtain an estimator of the covariance matrix of size p×p associated with this data.

Note: if you are interested in covariance estimation in the context of a linear regression, consider for now the package CovarianceMatrices.jl which focuses around that case.

Quick start

using CovarianceEstimation

X = randn(5, 7)

S_uncorrected  = cov(SimpleCovariance(), X)
S_corrected    = cov(SimpleCovariance(corrected=true), X)

# using linear shrinkage with different targets
LSE = LinearShrinkage
# - Ledoit-Wolf target + shrinkage
method = LSE(ConstantCorrelation())
S_ledoitwolf = cov(method, X)
# - Chen target + shrinkage (using the more verbose call)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:rblw)
S_chen_rblw = cov(method, X)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:oas)
S_chen_oas = cov(method, X)

# a pre-defined shrinkage can be used as well
method = LinearShrinkage(DiagonalUnitVariance(), 0.5)
# using a given shrinkage
S_05 = cov(method, X)

Currently supported algorithms

In this section, X is the data matrix of size n × p, S is the sample covariance matrix with S = κ (Xc' * Xc) where κ is either n (uncorrected) or n-1 (corrected) and Xc is the centred data matrix (see docs).

Time complexity: O(p²n) with a low constant

Sample covariance based methods

These methods build an estimator of the covariance derived from S. They are implemented using abstract covariance estimation interface from StatsBase.jl.

Time complexity:

Other estimators (coming)

These are estimators that may be implemented in the future, see also this review paper.

For HAC (and other estimators of covariance of coefficient of regression models) you can currently use the CovarianceMatrices.jl package.

Comparison to existing libraries

Rough benchmarks are run over random matrices of various sizes (40x20, 20x40, 400x200, 200x400). These benchmarks should (as usual) be taken with a pinch of salt but essentially a significant speedup should be expected for a standard problem.

References