yogevb / a-dda

Automatically exported from code.google.com/p/a-dda
0 stars 0 forks source link

Problems with clAmdFft library #157

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The recent changes in ADDA made it produce results in combination with clAmdFft 
1.8.239 for many cases. However, this library is not yet as robust as we want 
it to be. So this issue is supposed to discuss the existing shortcomings:

1) The library can be tested on any computer using client application, supplied 
with it. However, it only works in single precision, which is currently 
irrelevant to ADDA. Still, this tests produce failures on Nvidia GeForce 540M 
(on Windows 7 64-bit) - see comment to r1155. However, these tests are based on 
transforming the constant array and testing the results against the expected 
output (zeros almost everywhere). So the failure can be caused by loss of 
precision. Probably because of that ADDA works fine on the same machine - see 
r1178 (produces sufficiently accurate results).

2) On the same machine as above, when ADDA is compiled in 32-bit mode and 
linked to 32-bit clAmdFft it crashes for almost any grid sizes (-grid ...), 
except 6, 10, 14, and 16. The corresponding (working) FFT sizes are 12, 20, 30, 
and 32. I localized the crash to creation of the first FFT plan, and the 
critical part is double precision. So crash occurs even for a single (not 
batched) FFT of a double precision array, but works fine for any batch of 
single-precision ones.

The crash is most probably a library issue, but it might be also connected to 
the OpenCL.dll (32-bit) provided by Nvidia (I used driver 306.97). 

Original issue reported on code.google.com by yurkin on 20 Jan 2013 at 5:29

GoogleCodeExporter commented 9 years ago
Addition to problem (1) above: there is indeed loss of accuracy with clAmdFft, 
when its internal tests fail. 

In particular, running ADDA with '-grid 10' produces the same result for Cext - 
28.10875812 for sequential and both OpenCL versions (using Apple clFFT and 
clAmdFft). Which correlates with test pass for FFT size 20.

However, for '-grid 16' the result is 135.0449041, 135.044904, and 135.0450408 
respectively. This correlates with test fail for FFT size 32.

This loss of accuracy can become more significant for longer runs (with slower 
convergence of the iterative solver).

Original comment by yurkin on 30 Jan 2013 at 6:53

GoogleCodeExporter commented 9 years ago
I have just tested clAmdFft 1.10.274 on Windows. There seems to be no changes 
in the obtained results and with respect to all the bugs described above. The 
only difference is that 32-bit compilation on 64-bit Windows crashes with -grid 
16 (still working fine with -grid 6,10, and 14).

It still interesting to look in details whether the new release of clAmdFft 
contains any new features. In particular, it seem to be contain a flag for 
faster (but potentially less accurate transforms).

Original comment by yurkin on 15 Apr 2013 at 1:57

GoogleCodeExporter commented 9 years ago

Original comment by yurkin on 3 Aug 2014 at 4:51

GoogleCodeExporter commented 9 years ago

Original comment by yurkin on 3 Aug 2014 at 4:59