Open tlienart opened 5 years ago
There are also algorithms designed specifically to deal with missing data, for example: https://arxiv.org/pdf/1201.2577.pdf .
Ok so that's a Lasso-type problem on a slightly modified observed covariance (eq (1.5)). I guess that can be added once we've added a (Graphical) Lasso estimator for the covariance.
Consider exporting a shrinkage method that relies on the matrix S, but not the underlying matrix of samples, X (I note that analytical_nonlinear_shrinkage appears to use only S, and not X). The motivation here is that in stock data there are typically missing samples, so a matrix, X, cannot be fully constructed. Instead, pairwise covariances can be calculated to form the elements of a matrix, T (though T is not guaranteed positive semidefinite as its elements are computed on inconsistent data sets).
Then, consider adding the method described here: https://nhigham.com/2013/02/13/the-nearest-correlation-matrix/ (there is already sample code in Matlab/R/Python). Then, T can be "converted" to a positive semidefinite matrix, S, that can then be fed into analytical_nonlinear_shrinkage.
This looks like a good approach, I could review and merge a pull request that adds this. I don't personally need this functionality at the moment so I'm not going to work on it myself.
Probably for a future point:
I don't think that's ideal (using both
Statistics
andStatsBase
). See also covrob r package where a function to filter missing value can be provided.It would seem pretty easy to at least implement
And then maybe we could suggest imputing maybe via Impute.jl
refs