var should take absolute value for complex numbers. (migrated from Trac #638)

thouis commented 11 years ago

Original ticket http://projects.scipy.org/numpy/ticket/638 Reported 2007-12-30 by trac user akumar, assigned to unknown.

Hi!

I was just wondering why numpy.var gives me complex variances for complex numbers.

e.g.

h1 = (randn(1000,1) + 1j *randn(1000,1)) / sqrt(2)
var(h1)

gives: {{{ (-0.00757596036094-0.0234608549341j) }}}

While I actually expect the output to be similar to that which is from var(square(abs(h1))) which gives: {{{ 0.994899762171 }}}

I feel this is an error, though I may be wrong. :-)

Thanks.

Kumar

thouis commented 11 years ago

Comment in Trac by trac user akumar, 2007-12-30

Replying to [ticket:638 akumar]:

Hi!

I was just wondering why numpy.var gives me complex variances for complex numbers.

e.g.

{{{ h1 = (randn(1000,1) + 1j *randn(1000,1)) / sqrt(2) var(h1) }}} gives: {{{ (-0.00757596036094-0.0234608549341j) }}}

While I actually expect the output to be similar to that which is from {{{ var(square(abs(h1))) }}}

Wrong! Actually, it should be {{{ mean(square(abs(h1))) }}} which gives: {{{ 995184304741 }}}

Thanks.

Kumar

thouis commented 11 years ago

Comment in Trac by atmention:rkern, 2007-12-30

Can you point me to a reference defining the variance over complex numbers in the way you think it is? Near as I can tell, the only meaningful definition is the one implemented: treating complex numbers as 2-vectors with the variance being computed over each dimension independently. The formula you give is indeed valid for real numbers, but it does not make sense to extend it to complex numbers without change.

thouis commented 11 years ago

Comment in Trac by atmention:rkern, 2007-12-30

Sorry, I was incorrect. We don't do it the way I thought we did. Let me dive in.

thouis commented 11 years ago

Comment in Trac by trac user kumanna, 2007-12-30

Dear Robert,

Maybe I wasn't being clear enough. What I would like to request you to implement is the definition given here:

http://en.wikipedia.org/wiki/Variance#Generalizations

In my opinion, this is what is done in most cases, and in most signal processing books. I'll try to give you some evidence if this is not convincing enough.

Thanks!

Kumar

thouis commented 11 years ago

Comment in Trac by atmention:rkern, 2007-12-30

Hmm. You'll note, though, that the result is a matrix, not a single real or complex value. It is the result of finding the full covariance matrix of the 2-vector (real, imag). You really want to use {{{cov()}}}, not {{{var()}}}. Probably, we should just disallow complex values for {{{var()}}}.

thouis commented 11 years ago

Comment in Trac by trac user kumanna, 2007-12-30

Replying to [comment:5 rkern]:

Hmm. You'll note, though, that the result is a matrix, not a single real or complex value. It is the result of finding the full covariance matrix of the 2-vector (real, imag). You really want to use {{{cov()}}}, not {{{var()}}}. Probably, we should just disallow complex values for {{{var()}}}.

Well, that is true. However, should you wish to be somewhat compatible with Matlab or Octave, you wouldn't want to do that.

And, for some reason, it doesn't seem to do the same thing:

h1 = (randn(1000) + 1j *randn(1000)) / sqrt(2)
cov(h1)
Out: array(0.46705704279061899)

Huh?

mean(multiply(h1, conj(h1))) - multiply(mean(h1), conj(mean(h1)))
Out: (0.947439186641+0j)

Am I going wrong somewhere?

Thanks.

Kumar

thouis commented 11 years ago

Comment in Trac by atmention:rkern, 2007-12-30

I honestly don't care about compatibility with Matlab or Octave where I think they're wrong.

Note that the formula given in the Wikipedia article does not match the sentence following it. I think the sentence makes more sense than the formula. Complex numbers have two components. The appropriate (default) measure of spread of a distribution of complex numbers would be an ellipse (covariance matrix) rather than a circle (real scalar value). As far as statistical distributions are concerned, complex numbers are no different than 2-vectors. I need another reference to be convinced otherwise.

You can use {{{cov()}}} by separating the real and imaginary components. {{{cov([z.real, z.imag])}}}.

thouis commented 11 years ago

Comment in Trac by trac user kumanna, 2007-12-31

Dear Robert,

Replying to [comment:7 rkern]:

Note that the formula given in the Wikipedia article does not match the sentence following it. I think the sentence makes more sense than the formula. Complex numbers have two components. The appropriate (default) measure of spread of a distribution of complex numbers would be an ellipse (covariance matrix) rather than a circle (real scalar value). As far as statistical distributions are concerned, complex numbers are no different than 2-vectors. I need another reference to be convinced otherwise.

Well, I do not wish to argue over this, since you have provided a workaround. However, I would still like you to see the following before closing this ticket:

http://books.google.com/books?id=seoUuxiqG-oC&pg=PA408&dq=Variance+of+complex+random+variables&lr=&as_brr=0&sig=FTeiSD2WgxYYSg_F9vp5GiBMjAY http://books.google.com/books?id=CVJf2vHEF4cC&pg=PA32&dq=Variance+of+complex+random+variables&lr=&as_brr=0&sig=tt4_9H3Ko5KQrW_aDN7JQKC933o http://books.google.com/books?id=JKhw0jiqlckC&pg=PA194&dq=Variance+of+complex+random+variables&lr=&as_brr=0&sig=eumdxuzzpm1KtzMIbOY1KwAncDY

Of course, if you feel that is for `random variables' and not relevant for a deterministic array, like in our case, you may close this ticket.

You can use {{{cov()}}} by separating the real and imaginary components. {{{cov([z.real, z.imag])}}}.

Right, what you mean. But still, I would request you to give it one more thought before closing my request. :-)

Thanks for the time!

Kumar

thouis commented 11 years ago

Comment in Trac by trac user kumanna, 2007-12-31

Replying to [comment:8 kumanna]:

You can use {{{cov()}}} by separating the real and imaginary components. {{{cov([z.real, z.imag])}}}.

Right, what you mean. But still, I would request you to give it one more thought before closing my request. :-)

I am still not satisfied with this, as I have to take the trace of the covariance matrix to get what I want. But if this is the way it has to be, so be it. :-)

Kumar

thouis commented 11 years ago

Comment in Trac by atmention:charris, 2007-12-31

It's certainly possible to justify using the squared length, i.e., the trace of the covariance matrix. For instance, in taking a least squares approach without reference to any possible Gaussian statistics it is quite natural, and is often used in deriving the Kalman measurement update. Curiously, it is also what falls out of the Clifford algebra in an n dimensional vector space with an Euclidean inner product when the vectors are squared and the average taken. That said, I don't have much feeling one way or the other about how the variance is defined for complex numbers, although either the single squared norm or the full covariance matrix would seem slightly more natural to me.

thouis commented 11 years ago

Comment in Trac by atmention:cournape, 2008-01-12

I though a bit about it since the discussion on the numpy ML, and I have to say I disagree with Robert on this one. I don't think the only meaningful definition is to treat C as R^2. Variance is a special case of covariance, and for complex random variables, covariance of X and Y, assuming they are centered, is E[X conj(Y)] with conj(Y) the conjugate of Y. This is the definition used in statistical signal processing (at least the one I have always seen)

When considering complex random variables, it is often assumed some kind of properties of the real part and the complex part (such as they have the same variance, for example). For example, if you use complex Gaussian random variables, by definition, Z = X + jY, with X and Y independent Gaussian and same variance \sigma, Z have a variance equal to 2 * \sigma variance, that is the trace of the covariance matrix of the real random vector (X, Y), also obtained using the definition \sigma_Z \triangleq \mathbb{E}[Z \bar{Z}]. With Robert's definition, even for scalar complex random variables, the density of a complex normal would involve matrices: having a definition using only scalar is more appealing IMHO.

Those 2 arguments, variance as a special case of the covariance of two variable, and staying scalar for complex random variables seem pretty strong to me.

thouis commented 11 years ago

Comment in Trac by atmention:charris, 2008-03-11

I believe we talked Robert into changing the behavior of var, but it hasn't happened yet.

thouis commented 11 years ago

Comment in Trac by atmention:teoliphant, 2008-03-27

Fixed in r4945.

thouis / numpy-trac-migration

var should take absolute value for complex numbers. (migrated from Trac #638) #1245