mateuszbaran / CovarianceEstimation.jl

Lightweight robust covariance estimation in Julia
MIT License
42 stars 7 forks source link

Possible changes and extensions #90

Open timholy opened 11 months ago

timholy commented 11 months ago

I'd like to check receptivity/thoughts on a couple of possible extensions and changes:

mateuszbaran commented 11 months ago

Sure, I'm definitely open to doing some changes. Currently I don't have enough time to work on new features myself but I can do review and maintenance.

  • Would you be open to switching the return type for some methods to something that encodes both the matrix and its factorization? (For example, PDMats does this.) The factorization is already computed for some shrinkage methods and it seems a shame to throw all that work away, especially if the only thing you're computing the covariance matrix for is Mahalanobis distance calculations for which the factorized matrix is much more efficient.

Sure, maybe the best solution would be adding a parameter that tells whether to return the covariance matrix or its factorization.

  • For very high dimensional applications, I'm interested in "spiked" covariance models in which some eigenvalues are large and the remaining eigenvalues are assumed to be equal to one other. This allows one to efficiently represent the matrix in Woodbury form as an isotropic (or more generally, diagonal) matrix + a low-rank correction. There are also nonlinear shrinkage methods for this case that I'd be happy to contribute. But it would require a new WoodburyCovariance(rank::Int) <: CovarianceEstimator which I'd be happy to add if you're willing to take both WoodburyMatrices.jl and probably TSVD.jl on as dependencies. Alternately, we could make them package extensions but we'd need to export the WoodburyCovariance name from the main package. That's a bit ugly if, for example, you ever want to use Aqua (you'll fail the "no undefined exports" test, though you can disable it).

This looks like an interesting extension and it would be great to have it implemented here :+1: . Both WoodburyMatrices.jl and TSVD.jl look fairly lightweight so if they wouldn't increase loading time too much, I would be fine with adding them as dependencies.

timholy commented 11 months ago

adding a parameter that tells whether to return the covariance matrix or its factorization

With PDMats you return both: https://github.com/JuliaStats/PDMats.jl/blob/e0cad7caf7d955d5a14fe830f3684026a9652ad7/src/pdmat.jl#L4-L9. Note that AbstractPDMat{T} <: AbstractMatrix{T}, so the returned object acts like "the matrix" for purposes like C[i, j] and C * v. But for C \ v it instead uses the factorization. Essentially, the way to think about PDMats is as a way of caching the matrix and its factorization so you get good performance even if you use \ many times and in many different locations.

I would be fine with adding them as dependencies.

Great! I'll get cracking.

mateuszbaran commented 11 months ago

OK, cool. Returning both is also fine.