haskell / statistics

A fast, high quality library for computing with statistics in Haskell.
http://hackage.haskell.org/package/statistics
BSD 2-Clause "Simplified" License
299 stars 67 forks source link

Strange type signature of DCT #16

Closed Shimuuar closed 12 years ago

Shimuuar commented 13 years ago

Signature of dct and idct both are :: Vector (Complex Double) → Vector Double. It's implied that dct is invertible but it's obviously not. Only n out 2n degrees of freedom survive transformation and this information couldn't be recovered. Probably it should be Vector Double → Vector Double

Also different variants of DCT exists wikipedia describe no less than five. It would be helpful to describe which one is actually implemented.

bos commented 13 years ago

I fixed the signatures in 9801ebbc1a47db4c8ffcb778159fc97c8964b2d2. The Complex-valued variants of the functions remain, because they're more convenient for use with the new kernel density code.

bos commented 12 years ago

Why did you reopen this?

Shimuuar commented 12 years ago

I looks like comment was lost somehow. I argue that dct_ and idct_ should be removed.

  1. Transformation matrix of DCT is real valued and cannot mix real and imaginary parts of vector. Since we just drop imaginary part of vector dct_ [0 :+ a, 0:+ b] ≡ [0, 0]. That's not the case: dct_ [0 :+ 1, 0] = [0.0,1.414213562373095]. So it's not clear what transformation actually dct_ compute.
  2. KDE estimation code uses real-valued DCT since commit ccc45e5298fcf1df42571625b53d4e7d196364fd. Real value vectors were specifically converted to complex ones in order to use dct_
Shimuuar commented 12 years ago

I've done some math and found that way dct and idct are expressed via DFT correct only if input is real. Otherwise it will produce some meaningless output. So both 'dctandidct` shouldn't be exported. If there is no objection I'll drop them from export list.

bos commented 12 years ago

Maybe document them as having this behaviour instead, and/or change their names. I actually use them in their current form (but a name change wouldn't be a big deal).

Shimuuar commented 12 years ago

Thing I really dislike about these function is that they have precondition which is easy to break accidentaly. Imaginary part of all numbers must be equal to zero. It may be equal to zero but some changes in other part of program may introduce it and it's not clear what should be done with it.

I think documenting them is fine but they should be renamed to reflect their unsafeness. I'm not sure about name choice. unsafe is usually reserved for things that may blow up. These functions are just more likely to return garbage.

Another option is to change definition to drop imaginary part of vector. It have well defined semantics. Currently it does some unknown linear transformation of linear part and that's problem.

What is your opinion?

Shimuuar commented 12 years ago

Ping? This is only thing holding back 0.10.2. Personally I prefer second option more.

Shimuuar commented 12 years ago

Changeset above (not yet in master) modify dct_ and idct_ so they transform only real part of vector. Imaginary part of all value is set to 0 before transform. As I said already reason for this change is that currently both dct_ and idct_ do some strange linear transform to imaginary part. And this is simplest way to get reasonable behavior.

Any objections againist this?