New OpenCL FFT implementation

GoogleCodeExporter commented 9 years ago

Сurrently performance of adda_ocl is limited by the used Apple clFFT, which 
was originally created mostly as proof-of-principle. The emerging alternative 
is AMD implementaion. It should be faster than Apple one, support radixes of 3 
and 5 (like Temperton FFT). There are remaining questions whether it can be 
used with Nvidia cards.

Overall, finding and linking to more advanced OpenCL FFT implementations is 
definitely the main direction for development of adda_ocl.

Original issue reported on code.google.com by yurkin on 16 Apr 2012 at 1:33

GoogleCodeExporter commented 9 years ago

AMD FFT library usage is now controlled via ocl/Makefile since r1155.
It seems that the backward FFT of the AMD library still produces some errors 
which was a known problem for certain power of 2 sizes, where radix4 and radix8 
are involved.
With an AMD Radeon HD 5870 2GB and the proprietary AMD device drive driver 
Catalyst 12.4 and 12.6 and AMD APPML FFT version 1.6.244 and 1.8 Beta the 
problem still exists.
Timing is done using AMD FFT as forward FFT and Apple FFT as backward FFT.
If the Backward FFT of AMD can be used inside a-dda, with a Radeon HD 5870 it 
will speedup the FFT part to about a factor of 10 of the arithmetic part.

Original comment by Marcus.H...@gmail.com on 15 Aug 2012 at 10:09

GoogleCodeExporter commented 9 years ago

This issue was closed by revision r1178.

Original comment by yurkin on 18 Jan 2013 at 7:46

Changed state: Fixed

yogevb / a-dda

New OpenCL FFT implementation #144