josef-pkt commented 10 years ago

For implementation it looks like we could reuse sandwich covariance.

For RLM: H1 is the analog of OLS "nonrobust" covariance I haven't figured out if H3 is HC, it has a summation term that looks similar to HC0, but it doesn't look HC robust in an example. edit: H3 is not HC: all three specialize to the OLS variance for squared loss. summation includes psi_deriv times exog, and not psi times exog see Huber, Peter J. 1973. “Robust Regression: Asymptotics, Conjectures and Monte Carlo.” The Annals of Statistics 1 (5): 799–821. (In an example having heteroscedasticity robust cov didn't seem very important, since RLM has endogenous weighting, discounting outliers, i.e. high variance, noisy observations.)

I didn't find any references for HC for RLM, but there are for autocorrelated and spatially correlated, which just look like HAC with a truncated uniform kernel. Cui, Hengjian, Xuming He, and Kai W. Ng. 2004. “M-Estimation for Linear Models with Spatially-Correlated Errors.” Statistics & Probability Letters 66 (4): 383–93. doi:10.1016/j.spl.2003.10.018. Fan, Jun, Ailing Yan, and Naihua Xiu. 2014. “Asymptotic Properties for M-Estimators in Linear Models with Dependent Random Errors.” Journal of Statistical Planning and Inference. doi:10.1016/j.jspi.2013.12.005. http://www.sciencedirect.com/science/article/pii/S0378375813003078.

(The implementation by analogy doesn't look very difficult, but the theory looks hairy and the assumptions for the proofs are pretty restrictive.)

all the robust statistics books seem to be assume iid., or independence/uncorrelatedness of x and the variance of the error u, and of u across observations. Note there is the assumption that the error distribution is contaminated which allow for some iid distortion. Maronna, Marting Yohai have a chapter on time series models, that I didn't read and I guess assumes correct specification except for the contamination.

josef-pkt commented 10 years ago

no answer in http://stats.stackexchange.com/questions/84347/sandwich-covariance-for-robust-regression-using-m-estimators-for-data-exhibiting

josef-pkt commented 7 years ago

(based on mailing list discussion related to #3258)

Field, Christopher, and Julie Zhou. 2003. “Confidence Intervals Based on Robust Regression.” Journal of Statistical Planning and Inference 115 (2): 425–39. doi:10.1016/S0378-3758(02)00168-4.

What they call TLH for Lumley Haegerty looks like standard HAC with Bartlett kernel to me. (It's written by statisticians so they don't reference econometrics) However, they do pretesting for autocorrelation, which is recommended not to do for sandwich robust covariances in the econometrics literature.

josef-pkt commented 7 years ago

just some thoughts (Implementing it should go pretty fast, except for verification for unit test, and some design decisions for defaults.)

add get_robustcov_results and cov_type extras:

need to keep current covariance as default
need to handle "nonrobust" (not sure whether this should be basic sandwich similar to now or completely non-robust, i.e. assume rho function is a likelihood function, which is ok. So, we want the latter, i.e. just inverse hessian. But then we need something like current non-HC sandwich also generically ?)
add huber_correction=None (None or boolean) as extra that just multiplies cov_params_default (not sure whether we want to drop other small sample corrections. (None means default which can be cov_type specific, i.e. True for current, False for others.)
use_t: somewhere I read that F distribution and huber correction is needed in smallish samples to improve coverage. It's useful and we need it for consistency.
add score_factor and hessian_factor (should be essentially available through norms), score_obs and hessian just multiply by x or x'x (note only M-estimation, no extra weights like GM or other stuff yet)
we can delegate everything else to super (sandwich calculation through LikelihoodModelResults)

Note, as it is currently we only get cov_params for mean parameters, variance/scale estimate is asymptotically orthogonal under ... (symmetry)

An idea for unit test: RLM has gaussian norm as special case. I'm not sure it is fully unit tested, but we can compare RLM gaussian with OLS and compare the corresponding sandwich cov_types. Another one would be to define rho/norm as a symmetric density like t distribution. We might have to add an option to replace Huber proposal 2 scale calculation with something distribution specific, e.g. using weights corresponding to t-distribution. Then we would have other distributions to compare with. (Currently we have a Cauchy norm, but I'm not sure about what MLE would be in that case given that moments don't exist.) (Somewhere I read that in R the robust estimation results can be used with the sandwich package, but I don't think I want to struggle with that, at least not yet.)

(one implementation detail: do we want the robust norm weights in the score function like in irls or the "psi" (derivative of rho) function. Those are equivalent, try to be consistent with GLM which I guess means score_factor = psi(resid / scale) and score = (score_factor * exog).sum()

josef-pkt commented 4 years ago

illustrating a use case for HC covtype in RLM

6526 I'm looking at Yuen-Welch anova, robust to some outliers because of the use of trimmed mean comparison with oneway anova that is robust to heteroscedasticity.

that is a simple k-sample comparison with exog as a single categorical, where HC cov_type would allow for different variances across the k samples.

variation: Using oneway variance comparison like Levene-Welch anova, we could use mad or some other transformation of the residual as endog in RLM for a simple outlier robust comparison and test on variances.

josef-pkt commented 3 years ago

question for RLM with cluster robust standard errors https://stats.stackexchange.com/questions/519249/outlier-robust-regression-with-clustered-standard-errors

another idea: in GLM I added an option to attach the final WLS instance. That instance could be used to compute robust cov. RLM.fit only uses MinimalWLS for irls optimization. We would need just the weights to create the WLS instance.

RLM uses a different version of iterated WLS than GLM, there are no working residuals to take the role of endog in WLS. In GLM, the WLS cov is the expected information matrix, and not the observed information matrix, hessian. (default cov could differ between irls and gradient optimizers with non-canonical GLM link.) I never looked at that for RLM. (RLM has a linear link function, so there might not be a difference between expected and observed hessian.)

Problem is in writing unit test and verifying that it is "correct". Also, in the case of correlation, there will be some assumption on the interpretation of outliers, and whether they spill across observations, e.g. 3 types of outliers in time series literature.

related #3273

josef-pkt commented 2 years ago

Croux, Christophe, Geert Dhaene, and Dirk Hoorelbeke. "Robust standard errors for robust estimators." CES-Discussion paper series (DPS) 03.16 (2004): 1-20.

includes GMM derivation of HC and HAC cov_params for MM estimators, with references

josef-pkt commented 1 year ago

interesting idea: use RLM estimated weights and use WLS to get robust cov

https://stats.stackexchange.com/questions/519249/outlier-robust-regression-with-clustered-standard-errors/610964#610964

I did not look at details yet, e.g. What's the effect of the scale estimator? Does a WLS sandwich cov_params correctly take into account the "trimming" by redescending norms?

josef-pkt commented 1 year ago

another reference

Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley, 2006. section 5.13.2 Estimating the asymptotic covariance matrix under heteroskedastic errors

The section has references (including Croux et al discussion paper) and derives HC sandwich for MM-estimator, stacking moment conditions. This looks like standard form with derivative of moment condition instead of hessian. #8789 We don't have hessian in RLM, but I think we can get it easily using norm.psi_deriv. RLM also does not have score and score_obs.

josef-pkt commented 8 months ago

another reference

Cheng, Tsung-Chi. “On Simultaneously Identifying Outliers and Heteroscedasticity without Specific Form.” Computational Statistics & Data Analysis 56, no. 7 (July 1, 2012): 2258–72. https://doi.org/10.1016/j.csda.2012.01.004.

uses hc1 - hc4 sandwiches for Weighted least absolute deviation estimator, LAD However, it uses WLS HC cov_params, which AFAIK is incorrect because of the kink, nondifferentiability of LAD (as in Quantile Regression). i.e. it would apply to RLM with differentiable norm.

statsmodels / statsmodels

M-estimators: heteroscedasticity and correlation robust standard errors #1379

6526 I'm looking at Yuen-Welch anova, robust to some outliers because of the use of trimmed mean comparison with oneway anova that is robust to heteroscedasticity.