Closed lilltroll77 closed 12 years ago
I could try and write an FFT; using either high or normal precision arithmetic. Would this mean that all intermediate results are 64-bits wide? Or do you just compute one intermediate result at 64 bits and then reduce the intermediate result to 32 bits?
X0..XN are stored in K bits, where K is either 32 or 64. Root0..RootN are stored in 32 bits.
X0' := Root0 * X0 + Root1 * X1 X1' := Root0 * X0 - Root1 * X1
If K is 32, then the * is is 32x32 -> 64, and the + is 64+64 -> 64, the := is 64 -> 32 If K is 64, then the * is is 64x32 -> 96, and the + is 96+96 -> 96, the := is 96 -> 64
The former will just be MACCS instructions - the latter will be more expensive...
By the way, how many points in your FFT? 4096? (that is 16K worth of data - about as much as a core can cope with)
A first version FFT is in module_fft_simple. There are a number of obvious optimisations that I need to implement next. This version multiplies the FFT by N which needs repairing (but it gives a high precision answer as a result)
I am not sure how to compute a fair dBFS figure, but you probably can if I give you the numbers For a 1024 point FFT followed by an inverse FFT with random input in the range 0..0xFFFFFF (ie, 24 bit positive numbers) I get a sum of squared errors over both imaginary and real part of 1425554. This seems like a 5-bit error on a 24-bit signal, or 19 bits precision.
The current version is reasonably close to optimal speed (for this algorithm) and has reasonable accuracy. Either can be improved at a cost to the other...
Bottom line: about 1 bit of error for every factor 4 in the number of points, so 1024 points has 5-6 bits of error. 4096 points will have 6-7 bits of error. With a 24 bit signal you will have 16-17 bits of accuracy in your result, which is around 6_17 to 6_17 = -96 to -102 dB.
I will pull and take a look!
The 64 bit MACS is a little bit better compared to a typical 40 bit.
Have you made any calculations on the possible SNR using FFT for FIR filtering overlap and add.
I have never written a low level FFT implementation for an ISA my self like XMOS.
Is it possible to reach >100 dBFS without using extended precision.