Doubles generate compiler errors

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Any kernel with double values used

What is the expected output? What do you see instead?
I am testing with the kernel found in the users guide, the one that takes an 
array of floats and squares each element. The kernel works fine for floats, but 
for doubles, I get:

************************************************
:1:26: warning: unknown '#pragma OPENCL EXTENSION' - ignored
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
                         ^
:4:13: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable' 
before using 'double'
   __global double *val$out;
            ^

************************************************

What version of the product are you using? On what operating system?
- Ubuntu 11.10 64-bit
- Aparapi 2012-02-15 (latest version in Downloads at the time I write this)
- NVidia GTX480 with drivers v295.20 (latest at the time I write this)

Please provide any additional information below.
I assume the problem is I am using an NVidia card? I am available for any 
testing required.

Original issue reported on code.google.com by alex.kar...@gmail.com on 19 Feb 2012 at 8:57

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Also I am aware of issue 27, but it seems it does not apply to me (sorry for 
the double post)

Original comment by alex.kar...@gmail.com on 19 Feb 2012 at 8:59

GoogleCodeExporter commented 9 years ago

Thanks for reporting this.  We need to update the pragma handling for NVidia 
runtime.  Clearly the cl_amd_fp64 is an AMD specific flag. 

Sadly someone with access to a recent NVidia card and driver will have to debug 
this.  

My suggestion is that whoever steps up to fix this, jusy walk through (debugger 
in eclipse) the code which extracts the capabilities and determine which 
capability the NVidia driver/card reports for fp64 and adds appropriate tests 
and pragma's to the OpenCLWriter.  This is probably a 20 minute fix for someone 
with the appropriate hardware.  

Gary

Original comment by frost.g...@gmail.com on 19 Feb 2012 at 5:15

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Alex I just reread issue 27.  Why do you think that it does not apply?

Original comment by frost.g...@gmail.com on 19 Feb 2012 at 5:19

GoogleCodeExporter commented 9 years ago

Actually I am probably mistaken about issue 27. I thought the cl_amd_fp64 thing 
was already fixed and the issue was only regarding NaN.

I got the source code now and looking into what you suggested. Can you tell me 
in which class the code extracts GPU capabilities?

Original comment by alex.kar...@gmail.com on 19 Feb 2012 at 5:43

GoogleCodeExporter commented 9 years ago

Sure take a look at KernelRunner.  There is a method that extracts the 
capabilities of the device as a single string, then separates it into a 
HashSet.  There are helpers to check for specific required extensions (FP64 
should be one). If you look at the OpenCL writer it uses this information to 
initiate creating specific pragmas.

My guess is we can remove the AMD specific one (required prior to OpenCL 1.1) 
with a more general one.  Take a look. If I get a chance to try an NVidia card 
tommorrow I will take a look.

Gary

Original comment by frost.g...@gmail.com on 19 Feb 2012 at 11:10

GoogleCodeExporter commented 9 years ago

Thanks, that is very helpful :). I found the capabilitiesSet so I will see what 
I can get from there. 

Also, just a short note, Steve's suggestion (issue 27):

KernelWriter.java
-
         writePragma("cl_amd_fp64", true);
+
         writePragma("cl_khr_fp64", true);

makes doubles run correctly, but I assume this has undesired side-effects for 
other cards.

Original comment by alex.kar...@gmail.com on 19 Feb 2012 at 11:29

GoogleCodeExporter commented 9 years ago

Actually that might just be the fix.  So this change works for you regarding 
NVIDIA card?

I will try it with AMD.  I suspect it might just be as simple as this.   

Gary

Original comment by frost.g...@gmail.com on 20 Feb 2012 at 12:54

GoogleCodeExporter commented 9 years ago

Yes, that seems to do the trick for GTX480 and GTX580.

Original comment by alex.kar...@gmail.com on 20 Feb 2012 at 1:02

GoogleCodeExporter commented 9 years ago

This change also works for AMD cards (post OpenCL 1.1) so I will make this 
change. 

Gary

Original comment by frost.g...@gmail.com on 20 Feb 2012 at 8:11

GoogleCodeExporter commented 9 years ago

Alex I just checked in the changes for this, basically Steve's suggestion for 
issue 27.  

This was added to SVN R#285

We will leave issue 27 open because it also refers to the Nan issues. 

Can you validate R#285?

Original comment by frost.g...@gmail.com on 20 Feb 2012 at 8:22

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Just checked latest svn (R#288) and it works :).

Original comment by alex.kar...@gmail.com on 21 Feb 2012 at 12:03

GoogleCodeExporter commented 9 years ago

Awesome. Thanks to Stephen for the fix.

Original comment by frost.g...@gmail.com on 21 Feb 2012 at 3:21

jordan30001 / aparapi

Doubles generate compiler errors #40