statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.12k stars 2.88k forks source link

ENH: variance estimation and tests based on pairwise absolute differences #6798

Open josef-pkt opened 4 years ago

josef-pkt commented 4 years ago

I just ran into something related to this (*)

The sum of all pairwise absolute differences can be used as estimate of variance or dispersion. It's also referred to as Gini mean difference (related to Gini coefficient)

I guess related to #3380 which uses mostly sequential differences for estimating variance/scale

Carina Gerstenberger, Daniel Vogel & Martin Wendler (2019): Tests for Scale Changes Based on Pairwise Differences, Journal of the American Statistical Association, DOI: 10.1080/01621459.2019.1629938

cited by 2 arxivx papers that might be interesting

(*) Serfling, Robert Joseph. 2008. Approximation Theorems of Mathematical Statistics. S.l.: John Wiley & Sons. in chapter 8, p 263 Example A shows a shortcut how to compute sum of pairwise absolute differences “L-Estimates.” 2008. In Approximation Theorems of Mathematical Statistics, 262–91. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470316481.ch8.

idx = np.triu_indices(nobs, k=1)
np.sum(np.abs(x[idx[0]] - x[idx[1]])) * 2 / (nobs * (nobs - 1))
1.67007648797246

xs = np.sort(x)
ii = np.arange(1, nobs+1)
np.sum((2* ii - nobs - 1) * xs) * 2 / (nobs * (nobs - 1))
1.67007648797246

see also computation of L-moments https://en.wikipedia.org/wiki/L-moment#Sample_L-moments

aside: Rousseeuw and Croux use median or quantile which need a different computational speedup https://en.wikipedia.org/wiki/Robust_measures_of_scale#Absolute_pairwise_differences

I don't find an issue or PR for their S_n and Q_n. I looked at it in the context of robust scale estimates, but it looked too expensive to compute or too much work to implement efficiently for the added benefits.

esmucler commented 4 years ago

I've been working on an implementation of the Q_n estimator, which I always thought was a very neat idea, https://github.com/esmucler/statsmodels/tree/robust-Qn-scale, with the thought of creating a PR when ready.

Comments and suggestions are very much welcome

josef-pkt commented 4 years ago

@esmucler very good, thank you I'm not an expert in cython, so I'm no help there. We need some unit tests against verified numbers, those should be available in an R package.

PR will be welcome.