statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
9.93k stars 2.86k forks source link

M-estimators: heteroscedasticity and correlation robust standard errors #1379

Open josef-pkt opened 10 years ago

josef-pkt commented 10 years ago

For implementation it looks like we could reuse sandwich covariance.

For RLM: H1 is the analog of OLS "nonrobust" covariance I haven't figured out if H3 is HC, it has a summation term that looks similar to HC0, but it doesn't look HC robust in an example. edit: H3 is not HC: all three specialize to the OLS variance for squared loss. summation includes psi_deriv times exog, and not psi times exog see Huber, Peter J. 1973. “Robust Regression: Asymptotics, Conjectures and Monte Carlo.” The Annals of Statistics 1 (5): 799–821. (In an example having heteroscedasticity robust cov didn't seem very important, since RLM has endogenous weighting, discounting outliers, i.e. high variance, noisy observations.)

I didn't find any references for HC for RLM, but there are for autocorrelated and spatially correlated, which just look like HAC with a truncated uniform kernel. Cui, Hengjian, Xuming He, and Kai W. Ng. 2004. “M-Estimation for Linear Models with Spatially-Correlated Errors.” Statistics & Probability Letters 66 (4): 383–93. doi:10.1016/j.spl.2003.10.018. Fan, Jun, Ailing Yan, and Naihua Xiu. 2014. “Asymptotic Properties for M-Estimators in Linear Models with Dependent Random Errors.” Journal of Statistical Planning and Inference. doi:10.1016/j.jspi.2013.12.005. http://www.sciencedirect.com/science/article/pii/S0378375813003078.

(The implementation by analogy doesn't look very difficult, but the theory looks hairy and the assumptions for the proofs are pretty restrictive.)

all the robust statistics books seem to be assume iid., or independence/uncorrelatedness of x and the variance of the error u, and of u across observations. Note there is the assumption that the error distribution is contaminated which allow for some iid distortion. Maronna, Marting Yohai have a chapter on time series models, that I didn't read and I guess assumes correct specification except for the contamination.

josef-pkt commented 10 years ago

no answer in http://stats.stackexchange.com/questions/84347/sandwich-covariance-for-robust-regression-using-m-estimators-for-data-exhibiting

josef-pkt commented 7 years ago

(based on mailing list discussion related to #3258)

Field, Christopher, and Julie Zhou. 2003. “Confidence Intervals Based on Robust Regression.” Journal of Statistical Planning and Inference 115 (2): 425–39. doi:10.1016/S0378-3758(02)00168-4.

What they call TLH for Lumley Haegerty looks like standard HAC with Bartlett kernel to me. (It's written by statisticians so they don't reference econometrics) However, they do pretesting for autocorrelation, which is recommended not to do for sandwich robust covariances in the econometrics literature.

josef-pkt commented 7 years ago

just some thoughts (Implementing it should go pretty fast, except for verification for unit test, and some design decisions for defaults.)

add get_robustcov_results and cov_type extras:

Note, as it is currently we only get cov_params for mean parameters, variance/scale estimate is asymptotically orthogonal under ... (symmetry)

An idea for unit test: RLM has gaussian norm as special case. I'm not sure it is fully unit tested, but we can compare RLM gaussian with OLS and compare the corresponding sandwich cov_types. Another one would be to define rho/norm as a symmetric density like t distribution. We might have to add an option to replace Huber proposal 2 scale calculation with something distribution specific, e.g. using weights corresponding to t-distribution. Then we would have other distributions to compare with. (Currently we have a Cauchy norm, but I'm not sure about what MLE would be in that case given that moments don't exist.) (Somewhere I read that in R the robust estimation results can be used with the sandwich package, but I don't think I want to struggle with that, at least not yet.)

(one implementation detail: do we want the robust norm weights in the score function like in irls or the "psi" (derivative of rho) function. Those are equivalent, try to be consistent with GLM which I guess means score_factor = psi(resid / scale) and score = (score_factor * exog).sum()

josef-pkt commented 4 years ago

illustrating a use case for HC covtype in RLM

6526 I'm looking at Yuen-Welch anova, robust to some outliers because of the use of trimmed mean comparison with oneway anova that is robust to heteroscedasticity.

that is a simple k-sample comparison with exog as a single categorical, where HC cov_type would allow for different variances across the k samples.

variation: Using oneway variance comparison like Levene-Welch anova, we could use mad or some other transformation of the residual as endog in RLM for a simple outlier robust comparison and test on variances.

josef-pkt commented 3 years ago

question for RLM with cluster robust standard errors https://stats.stackexchange.com/questions/519249/outlier-robust-regression-with-clustered-standard-errors

another idea: in GLM I added an option to attach the final WLS instance. That instance could be used to compute robust cov. RLM.fit only uses MinimalWLS for irls optimization. We would need just the weights to create the WLS instance.

RLM uses a different version of iterated WLS than GLM, there are no working residuals to take the role of endog in WLS. In GLM, the WLS cov is the expected information matrix, and not the observed information matrix, hessian. (default cov could differ between irls and gradient optimizers with non-canonical GLM link.) I never looked at that for RLM. (RLM has a linear link function, so there might not be a difference between expected and observed hessian.)

Problem is in writing unit test and verifying that it is "correct". Also, in the case of correlation, there will be some assumption on the interpretation of outliers, and whether they spill across observations, e.g. 3 types of outliers in time series literature.

related #3273

josef-pkt commented 2 years ago

Croux, Christophe, Geert Dhaene, and Dirk Hoorelbeke. "Robust standard errors for robust estimators." CES-Discussion paper series (DPS) 03.16 (2004): 1-20.

includes GMM derivation of HC and HAC cov_params for MM estimators, with references

josef-pkt commented 1 year ago

interesting idea: use RLM estimated weights and use WLS to get robust cov

https://stats.stackexchange.com/questions/519249/outlier-robust-regression-with-clustered-standard-errors/610964#610964

I did not look at details yet, e.g. What's the effect of the scale estimator? Does a WLS sandwich cov_params correctly take into account the "trimming" by redescending norms?

josef-pkt commented 1 year ago

another reference

Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley, 2006. section 5.13.2 Estimating the asymptotic covariance matrix under heteroskedastic errors

The section has references (including Croux et al discussion paper) and derives HC sandwich for MM-estimator, stacking moment conditions. This looks like standard form with derivative of moment condition instead of hessian. #8789 We don't have hessian in RLM, but I think we can get it easily using norm.psi_deriv. RLM also does not have score and score_obs.

josef-pkt commented 8 months ago

another reference

Cheng, Tsung-Chi. “On Simultaneously Identifying Outliers and Heteroscedasticity without Specific Form.” Computational Statistics & Data Analysis 56, no. 7 (July 1, 2012): 2258–72. https://doi.org/10.1016/j.csda.2012.01.004.

uses hc1 - hc4 sandwiches for Weighted least absolute deviation estimator, LAD However, it uses WLS HC cov_params, which AFAIK is incorrect because of the kink, nondifferentiability of LAD (as in Quantile Regression). i.e. it would apply to RLM with differentiable norm.