statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.13k stars 2.88k forks source link

Reference: scaling of scatter matrix to get covariance #3220

Open josef-pkt opened 8 years ago

josef-pkt commented 8 years ago

(parking a reference to computational detail)

how do we normalize a scatter matrix so that it is consistent for specific distribution, commonly the normal cov = sigma = c scatter -> find "size" c

related: mad, iqr and similar have normalization constants here it is for the multivariate case

Maronna et al text book on robust statistics 2006 section 6.3.2 on page 186 using chi2 distribution for mahalanobis distances we can calculat c = median( {d_i}_i ) / chi2.ppf(0.5, k_vars)

this has also been used without reference in Maronna and Zamar 2002 on cov_ogk. I didn't see anything mentioned for the "size" estimates in Tyler estimator for scatter in elliptical distribution.

There are several references (*2) for consistency and small sample scaling of MCD and similar but I didn't look carefully (brief browsing or skimming doesn't show any obvious answer) Many articles just mention the scaling factors but they don't show the numbers or formulas.

Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. 2006. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley.

Maronna, Ricardo A., and Ruben H. Zamar. 2002. “Robust Estimates of Location and Dispersion for High-Dimensional Datasets.” Technometrics 44 (4): 307–17. doi:10.1198/004017002188618509.

(*2) Hardin, Johanna, and David M. Rocke. 2005. “The Distribution of Robust Distances.” Journal of Computational and Graphical Statistics 14 (4): 928–46. doi:10.1198/106186005X77685. Pison, G., S. Van Aelst, and G. Willems. n.d. “Small Sample Corrections for LTS and MCD.” Metrika 55 (1–2): 111–23. doi:10.1007/s001840200191.

josef-pkt commented 8 years ago

adding this here:

We should have some helper function or additional methods attached to the robust norms to calculate consistency, relative efficiency and similar. Those are needed all over the place, and it would be useful to find them in a central location instead of hardcoding a specific version into each function.

josef-pkt commented 8 years ago

reminder to myself "M:\josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_rlm_winsorized.py" has the variance for truncated mean calculation ELTS also should have a truncation correction somewhere

josef-pkt commented 8 years ago

example calculation using scipy.stats expect method (found in old script, and I don't remember the specifics, likely trying for M- or S-estimator of scale, log for try_robust_scale.py or try_robust_scale_iter.py)

(lines copied out of sequence)

>>> norm = rnorms.HuberT(2.5)
>>> norm = rnorms.TukeyBiweight()
>>> stats.norm.expect(lambda x, *args: norm.psi(np.abs(x**2)))
0.8093246617772843
>>> stats.norm.expect(lambda x, *args: rnorms.TukeyBiweight().rho(x))
0.43684963023076195
>>> stats.norm.expect(lambda x, *args: norm.rho(x))
0.3692679350253787

>>> norm = rnorms.TukeyBiweight()
>>> stats.norm.expect(lambda x, *args: x * norm.psi(x))
0.7577759186353068

>>> stats.norm.expect(lambda x, *args: x * norm.psi(x) - norm.rho(x))
0.3209262884898523
josef-pkt commented 8 years ago

this is also related to #3181

another related Croux, Christophe, and Catherine Dehon. 2010. “Influence Functions of the Spearman and Kendall Correlation Measures.” Statistical Methods & Applications 19 (4): 497–515. doi:10.1007/s10260-010-0142-z.

includes quadrant correlation, kendalls tau and spearman rho correlation conversion to be consistent with normal correlation (however a transformed correlation matrix is not always positive semidefinite, see later article, reference below) also includes asymptotic variance of correlation coefficients at normal distribution (underestimates in small samples to various degrees in their MonteCarlo, so small sample correction, but the much larger distortion, bias, comes if there are outliers, or, I guess, non-normality in general as in variance hypothesis tests.)

looks useful but I don't know where they should go

Boudt, Kris, Jonathan Cornelissen, and Christophe Croux. 2011. “The Gaussian Rank Correlation Estimator: Robustness Properties.” Statistics and Computing 22 (2): 471–83. doi:10.1007/s11222-011-9237-0. looks also at positive semidefinite problem of kendall and spearman after transformation to be consistent with normal correlation. recommendation: needs nobs > 3 * k_vars for kendall and nobs > 2 * k_vars for spearman.

gaussian rank correlation is consistent and asymptotically efficient (same asy variance as pearson) at normal distribution

not sure yet where to put this

something like for asy var, matching the examples in the two articles:


pearson
>>> rho = np.array([0.2, 0.8]); (1 - rho**2)**2
array([ 0.9216,  0.1296])

kendal
>>> rho = np.array([0.2, 0.8]); (1 - rho**2) * np.pi**2 * (1./9 - 4 / np.pi**2 * np.arcsin(rho / 2)**2)
array([ 1.01422912,  0.15092577])

quadrant
>>> rho = np.array([0.2, 0.8]); (1 - rho**2) * (np.pi**2 / 4 - np.arcsin(rho)**2)
array([ 2.32978184,  0.57870888])

spearman is more complicated with terms like this (if my odint does what I think it does)

integrate.odeint(lambda t, x: np.arcsin(np.sin(x) / (1 + 2 * np.cos(2 * x))), 0, t=np.linspace(0, np.arcsin(0.5), 11))
josef-pkt commented 8 years ago

two more references with consistency factors for covariance

I'm using Table 1 from Croux and Haesbroeck as test reference numbers (I wrote my function initially partially by trial and error to get correct results in Monte Carlo). Riani et al have a table numbers for tukey bisquare S-estimator, but I only skimmed their paper. (another paper refers to an article in a conference book for a table but I don't have access to it.)

They have more general expressions for elliptically symmetric distribution (based on g function)

Croux, Christophe, and Gentiane Haesbroeck. 1999. “Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator.” Journal of Multivariate Analysis 71 (2): 161–90. doi:10.1006/jmva.1999.1839.

Riani, Marco, Andrea Cerioli, and Francesca Torti. 2014. “On Consistency Factors and Efficiency of Robust S-Estimators.” TEST 23 (2): 356–87. doi:10.1007/s11749-014-0357-7.

josef-pkt commented 3 years ago

(not sure what's the closest issue to this) multivariate t-distribution used to estimate scatter matrix (I have it in some PR)

I just saw MASS has a function cov.trob: Covariance Estimation for Multivariate t Distribution should be good for unit test and checking how they scale the scatter matrix, or get covariance matrix of endog.

josef-pkt commented 1 year ago

" In order to obtain a unique MLE we fix the scale of the estimator by assuming that \<trace of omega_inv> of the true covariance matrix is known (or arbitrarily fixed) " before equ (6) p. 420 in

Soloveychik, Ilya, and Ami Wiesel. “Performance Analysis of Tyler’s Covariance Estimator.” IEEE Transactions on Signal Processing 63, no. 2 (January 2015): 418–26. https://doi.org/10.1109/TSP.2014.2376911.

the usual normalized Tyler's scatter matrix has trace(S) = p So we can rescale the scatter matrix so that trace(cov) = sum(variances) for some robust or nonrobust variance estimates.

aside: I saw several articles for regularized or shrinkage Tyler scatter matrix (in analogy to regularizing/shrinking sample cov)

large overview of Tyler's scatter Wiesel, Ami, and Teng Zhang. “Structured Robust Covariance Estimation.” Foundations and Trends® in Signal Processing 8, no. 3 (December 21, 2015): 127–216. https://doi.org/10.1561/2000000053.

and several more recent articles (I skimmed only a few parts)

Ashurbekova, Karina, Antoine Usseglio-Carleve, Florence Forbes, and Sophie Achard. “Optimal Shrinkage for Robust Covariance Matrix Estimators in a Small Sample Size Setting,” March 2021. https://hal.science/hal-02378034.

Goes, John, Gilad Lerman, and Boaz Nadler. “Robust Sparse Covariance Estimation by Thresholding Tyler’s M-Estimator.” The Annals of Statistics 48, no. 1 (February 2020): 86–110. https://doi.org/10.1214/18-AOS1793.

Hediger, Simon, Jeffrey Näf, and Michael Wolf. “R-NL: Covariance Matrix Estimation for Elliptical Distributions Based on Nonlinear Shrinkage.” IEEE Transactions on Signal Processing 71 (2023): 1657–68. https://doi.org/10.1109/TSP.2023.3270742.

Ollila, Esa. “Linear Shrinkage of Sample Covariance Matrix or Matrices under Elliptical Distributions: A Review.” arXiv, August 9, 2023. https://doi.org/10.48550/arXiv.2308.04721.

Ollila, Esa, Daniel P. Palomar, and Frédéric Pascal. “Shrinking the Eigenvalues of M-Estimators of Covariance Matrix.” IEEE Transactions on Signal Processing 69 (2021): 256–69. https://doi.org/10.1109/TSP.2020.3043952.

Zhang, Teng, and Ami Wiesel. “Automatic Diagonal Loading for Tyler’s Robust Covariance Estimator.” In 2016 IEEE Statistical Signal Processing Workshop (SSP), 1–5, 2016. https://doi.org/10.1109/SSP.2016.7551741.

another recent review article that looks good and is shorter than the Wiesel now mini-book

Taskinen, Sara, Gabriel Frahm, Klaus Nordhausen, and Hannu Oja. “A Review of Tyler’s Shape Matrix and Its Extensions.” In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler, edited by Mengxi Yi and Klaus Nordhausen, 23–41. Cham: Springer International Publishing, 2023. https://doi.org/10.1007/978-3-031-22687-8_2.

aside: Nordhausen is co-author or maintainer of several R packages that include extensions of Tyler's scatter estimation

josef-pkt commented 10 months ago

New article with explicit scale estimate for Tyler's shape matrix

Ollila, Esa, Daniel P. Palomar, and Frederic Pascal. “Affine Equivariant Tyler’s M-Estimator Applied to Tail Parameter Learning of Elliptical Distributions.” arXiv, May 7, 2023. https://doi.org/10.48550/arXiv.2305.04330.

brief skimming: It looks like it's just the average of the inverse weights, see equ. (6)

I can try it out in PR #8129

josef-pkt commented 6 months ago

in #9227 I use an M-scale to scale the shape matrix with det(shape)=1, with consistency, scale_bias at normal distribution. In CovS it is part of the definition.