ENH: OLS with complex-valued data

nbedou commented 7 years ago

original title: "Exception of RegressionResult.summary() when performing OLS with complex-valued data"

import numpy as np
import statsmodels.formula.api as sm

x = np.linspace(0, 10, 100)
# Edit: using a complex-valued x, for example "x = np.linspace(0, 10 + 5j, 100)" works too

beta = 5. + 2j
y = beta * x

est = sm.OLS(y, x).fit()
assert np.allclose(beta, est.params[0])  # OK

print(est.summary())

I am performing an ordinary least square (OLS) on complex-valued data. The OLS regression works as expected. However calling the method 'summary()' raises the following exception: TypeError: ufunc 'chdtrc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

statsmodels version: 0.8.0 numpy version: 1.11.2 Python 3.5.2 Windows 7 64 bits

josef-pkt commented 7 years ago

Interesting, I never heard of anyone using complex valued data with this.

What is your use case? And do the regression results look correct?

The problem here is that some of the distribution functions in scipy.special do not support complex numbers. And I have no idea what the statistical interpretation or definition of, for example, the cdf of the normal or chisquare distribution is. Do you have a reference?

As possible solution:

we could protect those calls and display np.nan in the summary
in some cases, like the normal cdf, there are other scipy.special functions that work also for complex numbers. But as I said, I don't know what the interpretation is, so this would be purely mechanical. Also, I never looked at alternative functions for other distributions besides normal, e.g. I don't know if t.cdf or t.sf has a complex version.
most likely incorrect: throw away the imaginary part for the p-values and cast to float64

Aside: many, but not all, models support complex numbers internally in the estimation, which we use in some cases for complex step derivatives. But there are currently no examples or unit tests for complex numbers beyond complex step derivatives.

josef-pkt commented 7 years ago

Here is a stackoverflow question http://stats.stackexchange.com/questions/66088/analysis-with-complex-data-anything-different

I think the code needs to be checked for transpose versus conjugate transpose, especially if exog is also complex. Some of the quadratic terms will become real if we use conjugate transpose. So maybe the quadratic forms for the chisquare hypothesis tests are real if properly defined, but even then tvalues would be complex divided by real. (aside sqrt will not roundtrip with complex conjugate power 2 ???)

>>> x = np.random.randn(100, 4) + 1j * np.random.randn(100, 4)
>>> x.T.dot(x)
array([[ 23.91181839 -5.32691j   , -11.06585561-11.15631374j,
         -9.17630698 +5.23187909j,  -0.39012032 +6.80125011j],
       [-11.06585561-11.15631374j,  -4.23147304+36.15461905j,
        -12.69959335-17.25114591j,  -2.09730406-14.52219175j],
       [ -9.17630698 +5.23187909j, -12.69959335-17.25114591j,
         22.86545481 -0.11291921j,   1.34340469 -2.17636526j],
       [ -0.39012032 +6.80125011j,  -2.09730406-14.52219175j,
          1.34340469 -2.17636526j, -36.26866425-29.63732456j]])

>>> np.diag(x.conj().T.dot(x))
array([ 199.67422289+0.j,  211.25381760+0.j,  159.30436393+0.j,
        175.19411159+0.j])
>>> np.diag(x.T.dot(x))
array([ 23.91181839 -5.32691j   ,  -4.23147304+36.15461905j,
        22.86545481 -0.11291921j, -36.26866425-29.63732456j]) 

>>> np.real_if_close(np.diag(x.conj().T.dot(x)))
array([ 199.67422289,  211.2538176 ,  159.30436393,  175.19411159])

>>> np.real_if_close(np.diag(np.linalg.inv(x.conj().T.dot(x))))
array([ 0.00508739,  0.00476745,  0.00643579,  0.0058142 ])

>>> np.real_if_close(x[:, 0].conj().dot(x[:, 0]))
array(199.67422288575872)

josef-pkt commented 7 years ago

It seems difficult to find any literature on this, and some that I found is restricted pay access that I don't get.

(The only applications that I can think of with complex endog are in frequency domain, fft or characteristic functions.)

nbedou commented 7 years ago

My use case consists in fitting two Fourier spectra, hence the complex-valued data. As shown in the updated code in my first post, the parameters of the regression are successfully found. However I wouldn't trust any of the statistical tests and other quality coefficients until I am sure they work as expected with complex-valued data (it's likely they don't).

I am not familiar with statistics with complex data. If I find some time I'll try to find references about this.

josef-pkt commented 7 years ago

Wikipedia has an interesting page on complex normal distribution https://en.wikipedia.org/wiki/Complex_normal_distribution

I found two overview articles for some basics for complex-valued data analysis and statistics. The 2011 article has more of the very basics for complex models, the 2015 has the convergence to complex normal distribution. (I read most of the 2011, but stopped reading the 2015 article) With these references as starting point it will be easier to find other ones. There seems to be some content in signal processing books, but I never looked at any of those.

"Unfortunately, many fundamental statistical tools for handling complex-valued parameter estimators are missing or scattered in open literature." quote from Delmas Abeida, but the other has a similar sentence

Delmas, Jean Pierre, and Habti Abeida. 2015. “Survey and Some New Results on Performance Analysis of Complex-Valued Parameter Estimators.” Signal Processing 111 (June): 210–21. doi:10.1016/j.sigpro.2014.12.009.

Ollila, E., V. Koivunen, and H. V. Poor. 2011. “Complex-Valued Signal Processing - Essential Models, Tools and Statistics.” In 2011 Information Theory and Applications Workshop, 1–10. doi:10.1109/ITA.2011.5743596. edit another similar intro article, a bit easier to read and more examples explaining the terms (e.g. circular versus proper)

Adali, T., and P. J. Schreier. 2014. “Optimization and Estimation of Complex-Valued Signals: Theory and Applications in Filtering and Blind Source Separation.” IEEE Signal Processing Magazine 31 (5): 112–28. doi:10.1109/MSP.2013.2287951.

A few thoughts or points:

A complex-valued random variable is isomorphic to a real bivariate random variable. That would provide some intuition, and allows as to use a multivariate model with bivariate endog as reference and test case.
covariance of complex-valued r.v. has two parts, the covariance with Hermitian transpose and a pseudo-covariance with standard transpose.
in general real and imaginary parts can be correlated, special case is "circular" with uncorrelated components.
Ollila, Koivunen, and Poor briefly describe elliptically symmetric distribution and scatter (Tyler) matrix analogous to the real case
for inference (no literature, analogy to bivariate distributions): If parameters are complex-valued, then they have two parts, i.e. are bivariate parameters. We can either have hypothesis like t-test on real and imaginary parts separately, or on both jointly as in chisquare/F based tests. This applies to tvalues, pvalues and confint.
summary will most likely have problems with formatting complex numbers (i.e. putting two numbers in one column). summary2 is more flexible and might work. Otherwise, we could split parameters into real and imaginary "sub-parameter" similar to multi-equation models (like MNLogit, VAR and in future MultivariateOLS).
If endog is complex-valued and exog is real, then this is just two OLS on separate parts (like MultivariateOLS or SUR with identical exog). Problem is in the joint distribution of real and imaginary parts of the parameters if the distribution is not *circular" (uncorrelated components)
covariance of a k vector of complex-valued random variables has 2k x 2k elements, unless circular. (IIUC, when pseudo covariance is zero, or components are uncorrelated, it should be rank reduced in the full version. ?)
There are some recipes on stackoverflow and stats.stackexchange for (linear or nonlinear) regression with splitting real and complex parts.

design question:

wrapping complex into a internally real bivariate model or
writing estimators and inference that work with complex dtype

My guess is that the first is easier, when we have multivariate models like MultivariateOLS.

josef-pkt commented 7 years ago

I changed the title to indicate better that this is an enhancement, given that it is currently not supported.

Related aside: we should add a Warning if exog is anything else except float64.

josef-pkt commented 7 years ago

This one looks like it has to go on the background reading list

same authors as the 2011 paper above plus Tyler (elliptical symmetric distributions are spread over a few issues like #3280 #3266)

Ollila, E., D. E. Tyler, V. Koivunen, and H. V. Poor. 2012. “Complex Elliptically Symmetric Distributions: Survey, New Results and Applications.” IEEE Transactions on Signal Processing 60 (11): 5597–5625. doi:10.1109/TSP.2012.2212433.

josef-pkt commented 7 years ago

(I added this comment initially by mistake to #3530 )

One topic that seems to become popular recently in signal processing, is moving away from the circularity assumption

e.g. and citing articles

Adali, T., P. J. Schreier, and L. L. Scharf. 2011. “Complex-Valued Signal Processing: The Proper Way to Deal With Impropriety.” IEEE Transactions on Signal Processing 59 (11): 5101–25. doi:10.1109/TSP.2011.2162954.

(based on reading abstract, looks too much details for now.)

josef-pkt commented 10 months ago

new issue #9053 is duplicate

quick try again using a modified example of first comment

It looks more like OLS computes a vectorized OLS treating real and imaginary parts as separate regressions. Scale and similar don't have an abs, i.e. real and imaginary parts stay separate. (I think that's what we would need in complex step derivatives.

possible cases

interpretation: separate OLS versus multi-/bivariate OLS, (AFAIR, complex normal distribution allows for correlation between complex and real parts. maybe: if all exog are real, then we could have behavior similar to MultivariateLS).
joint statistics (with e.g. abs, or hermitian instead of transpose) versus separate statistics for results statistics.

import numpy as np
import statsmodels.api as sm

x = np.linspace(0, 10, 100)
# Edit: using a complex-valued x, for example "x = np.linspace(0, 10 + 5j, 100)" works too

beta = 5. + 2j
y = 2 + beta * x + 0.01 * (np.random.randn(100) + np.random.randn(100) * 1j)

est = sm.OLS(y, sm.add_constant(x)).fit()
#assert np.allclose([2, beta], est.params, atol=1e-3)  # OK
print(est.params)
print(est.bse)
print(est.tvalues)
print(est.pvalues)
[1.99963621-1.40128135e-03j 5.00017512+2.00007937e+00j]
[2.16217394e-04+0.00096241j 3.73557373e-05+0.00016628j]
[  442.97509614 -1978.21780668j 17882.15767065-26054.27038404j]
[2.50471394e-228 0.00000000e+000]

est.scale, est.mse_resid, est.mse_model, est.ssr, est.df_resid, est.df_model, est.model.nobs
((-2.231855670074959e-05+1.0561344526467748e-05j),
 (-2.231855670074959e-05+1.0561344526467748e-05j),
 (17854.75423725502+17004.637336882613j),
 (-0.0021872185566734603+0.0010350117635938394j),
 98.0,
 1.0,
 100.0)

update only params correspond to vectorized OLS, other statistics differ

est = sm.OLS(y.real, sm.add_constant(x)).fit()
#assert np.allclose([2, beta], est.params, atol=1e-3)  # OK
print(est.params)
print(est.bse)
print(est.tvalues)
print(est.pvalues)
est.scale, est.mse_resid, est.mse_model, est.ssr, est.df_resid, est.df_model, est.model.nobs

[1.99963621 5.00017512]
[0.00182366 0.00031507]
[ 1096.49847116 15869.95034259]
[3.57102361e-202 6.58996349e-316]

(8.439645918533599e-05,
 8.439645918533599e-05,
 21255.69756213966,
 0.008270853000162927,
 98.0,
 1.0,
 100.0)

est = sm.OLS(y.imag, sm.add_constant(x)).fit()
#assert np.allclose([2, beta], est.params, atol=1e-3)  # OK
print(est.params)
print(est.bse)
print(est.tvalues)
print(est.pvalues)
est.scale, est.mse_resid, est.mse_model, est.ssr, est.df_resid, est.df_model, est.model.nobs

[-1.40128135e-03  2.00007937e+00]
[0.00205066 0.00035429]
[-6.83331948e-01  5.64529876e+03]
[4.96009512e-001 6.45524707e-272]

(0.00010671501588608387,
 0.00010671501588608387,
 3400.943324884639,
 0.010458071556836219,
 98.0,
 1.0,
 100.0)

How should cov_params, specifically the scale for it, be defined?

josef-pkt commented 9 months ago

search for regression, least squares

Chang-Jun et al looks good as intro, with main references Miller 1973 (basic linear least squares) and Pincinbono 1994, 1995 (real and imaginary parts are correlated, widely linear estimation)

Chang-Jun, Y. a. N., Zhang San-Guo, and Ning Wei. “Estimations of the Improper Linear Regression Models with Complex-Valued Data.” Journal of University of Chinese Academy of Sciences, no. 2 (March 15, 2012): 145. https://doi.org/10.7523/j.issn.2095-6134.2012.2.001. (although it has 0 citations in google scholar)

AFAICS, improper or noncircular residuals/noise have correlated real and imaginary parts. OLS-complex is still consistent but not efficient. Pincinbono et al 1995 introduced widely linear estimation if exogs complex. (AFAIU from skimming)

also the two definitions for cov of complex variables correspond to x^H x versus x' x (i.e x^T x) correspond to the two cov for improper complex variables.

TBC

josef-pkt commented 9 months ago

I still don't find much on inference for complex-valued linear least squares

basic idea AFAIU

complex normal distribution is "equivalent" to bivariate normal distribution params is k-variate complex normal, this should be "equivalent" to a 2*k variate normal distribution (with possibly some restriction on covariance)

hypothesis testing for complex parameter beta z: z = u + i v (i = sqrt(-1)) H0: z = 0 corresponds to the joint hypothesis u=0 and v=0, so we have two restriction for bivariate normal distribution. H0: z = z_0 corresponds to to restrictions u = u_0 and v = v_0 (we need to specify both real and imaginary parts in H0 ) H0: v = 0, u unspecified : hypothesis that v is real, i.e. imaginary part is zero

so, for these H0 we can use F or chi2 distribution with 2 degrees of freedom (restrictions)

related: asymptotics for the nonlinear least squares version (uses assumption that real and imaginary parts of noise are uncorrelated)

Debasis Kundu (1991) Asymptotic properties of the complex valued nonlinear regression model, Communications in Statistics - Theory and Methods, 20:12, 3793-3803, DOI: 10.1080/03610929108830741

similarly to inference: we can define results statistics either as joined/combined statistics, e.g. u2 + v2 (simple sum of variances), or defined results statistics for bivariate interpretation var(u), var(v), cov(u, v)

not checked yet: do we use weighted sums, e.g. [u, v] cov(u, v)**{-1} [u, v] (like mahalanobis distance treating z as bivariate [u, v]) ?

josef-pkt commented 9 months ago

I'm giving up here

Miller 1973 looks good for the proper/circular case He has variance of real and imaginary part also equal, uncorrelated does not seem to be enough.

The improper, noncircular case and widely linear estimation is more difficult and requires some understanding of noncircular complex random variables. (including looking at definition of complex normal distribution again)

Adali et al looks like a good overview, but requires work to go through the details

Adali, Tülay, Peter J. Schreier, and Louis L. Scharf. “Complex-Valued Signal Processing: The Proper Way to Deal With Impropriety.” IEEE Transactions on Signal Processing 59, no. 11 (November 2011): 5101–25. https://doi.org/10.1109/TSP.2011.2162954.

Picinbono, B. “On Circularity.” IEEE Transactions on Signal Processing 42, no. 12 (December 1994): 3473–82. https://doi.org/10.1109/78.340781.

Picinbono, B., and P. Chevalier. “Widely Linear Estimation with Complex Data.” IEEE Transactions on Signal Processing 43, no. 8 (August 1995): 2030–33. https://doi.org/10.1109/78.403373.

section 3.1 in the following has a list of properties of circular complex random variables (I did not look at the other parts)

Agüero, Juan C., Juan I. Yuz, Graham C. Goodwin, and Ramón A. Delgado. “On the Equivalence of Time and Frequency Domain Maximum Likelihood Estimation.” Automatica 46, no. 2 (February 1, 2010): 260–70. https://doi.org/10.1016/j.automatica.2009.10.038.

josef-pkt commented 9 months ago

some of the comments above are incorrect e.g. circularity does not imply that real and imaginary parts are uncorrelated. An assumption for circularity is that the pseudo-covariance is zero. https://en.wikipedia.org/wiki/Complex_normal_distribution#Distribution_of_real_and_imaginary_parts

or there are several definitions mixed up circular versus second order circular/proper

update if the complex random variable is scalar (normal distributed?), then proper/circular implies correlation between real and imaginary parts is zero. In multivariate case there is some condition that there are equal number of positive parts and negative parts in the pseudo-covariance so they cancel even though they are nonzero, but I don't really understand that.

update 2 (I did not manage to give up on the topic) more background in #9064

statsmodels / statsmodels

ENH: OLS with complex-valued data #3528