ironted / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Primitives with @Local annotation #41

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Have a primitive annotated with @Local. 

What is the expected output? What do you see instead?
Try to use the variable with the assumption that each block/group has a copy. 
Notice that each thread has its own copy.

Workaround:
Declare the variable as an array of size 1.

What version of the product are you using? On what operating system?
aparapi R288, Ubuntu 11.10 amd64, Java 7

Please provide any additional information below.
I need 2-3 single variables to be available to all threads in a block/group. I 
declare local memory variables and have the thread with localId() 0 read the 
value from the global memory and write it there (I am not sure this is a good 
idea, so any comments on that are welcome). After the assignment, I have a 
localBarrier(). 
-If you declare the variable as a primitive, only thread 0 sees the correct 
value (all others see the default value 0.0)
-If you declare the variable as an array of size [0], then the behaviour is the 
expected from @Local

Original issue reported on code.google.com by alex.kar...@gmail.com on 27 Feb 2012 at 10:26

GoogleCodeExporter commented 8 years ago
Alex 

There were various possible targets for @Local, which were you trying to use?

Given this class

public class MyKernel extends Kernel{
   @Local int thisField
   @Local int thisArrayField[] = new int[100];
   public void run(){
        @Local int thisLocalVariable;
        @Local int thisLocalArray[] = ??; 
   }
}

We only support the 'thisArrayField[]' from the example above.

'thisField' would be useful if the variable was read-only.  We don't allow 
writing to fields (not in generated OpenCL at least) because each work item 
gets it's own thread so instance fields would cause a race (same in JTP 
actually!) 

Possibly @Local static int primitive might make sense. 

thisLocalVariable would be possible (not supported yet).

thisLocalArray would not be possible (we don't allow array aliasing).

But I would be interested in which of these you were wanting. 

Gary

Original comment by frost.g...@gmail.com on 27 Feb 2012 at 10:45

GoogleCodeExporter commented 8 years ago
[Nice presentation of the concept!]
So, I was referring to thisField and thisArrayField[]. thisArrayField[] works 
as expected, but the wiki about @Local gave me the impression that thisField 
would work as well, especially considering that a one-element thisArrayField[] 
works. 

My idea was to have thread 0 of each group write the field so that other work 
items in the same group wouldn't have to access it from the global memory. 
Something like:

public class MyKernel extends Kernel{
   @Local int thisField = 0;
   public void run(){
        if (getLocalId() == 0){
            thisField = anArrayFromTheGlobalMemory(this.getGroupId());
        }
        localBarrier();
   }
}

Original comment by alex.kar...@gmail.com on 27 Feb 2012 at 11:05

GoogleCodeExporter commented 8 years ago
Yes we don't actually support write to fields. In a weird way it is best to 
think of fields as args.  Actually that is what we do we convert the accessed 
fields into OpenCL args.  For primitive scalars we use openCL's clSetKernelArg, 
for arrays we create buffers and set the buffer as the arg.  So because of the 
normal copy by value of scalars modifying a scalar field does nothing. For each 
work item it does change the arg for a while, but can never be seen by another 
work item executing. 

Run aparapi with -Dcom.amd.aparapi.enableShowOpenCL=true and take a look at the 
args, I think it will help explain why fields are not mutable unless they are 
arrays. 

However, I think that if the field was a static field.  We might be able to 
infer that it can be changed and convert it to a one element array (behind the 
scenes).  This then would also work from Java and OpenCL side. 

However, we don't have this yet ;) Sorry about that.  I will modify the wiki 
page to try to explain this. 

Original comment by frost.g...@gmail.com on 28 Feb 2012 at 12:53

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Well, glad to have sorted it out :-) . I will take look at the generated code 
as well.

Original comment by alex.kar...@gmail.com on 28 Feb 2012 at 1:14

GoogleCodeExporter commented 8 years ago
What OpenCL args? What do they have to do with local memory?

Local memory is a little bit (16-48KB) of very fast SRAM that is located inside 
a GPU core and can be used as a scratchpad or a manual cache by the 
threads/work-items of a block/work-group. It has nothing to do with passing 
args (which should go into global mem).

A Local scalar makes perfect sense. "We don't allow writing to fields (not in 
generated OpenCL at least) because each work item gets it's own thread so 
instance fields would cause a race (same in JTP actually!)" And that's the 
reason for having barriers...

Original comment by adubin...@almson.net on 16 Feb 2013 at 12:41

GoogleCodeExporter commented 8 years ago
See 
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clSetKernelArg.html

Scroll down to the description of arg_value and arg_size. 

OpenCL allows you to allocate local memory (on the GPU) by setting an ARG as 
local and defining the size.  We use this mechanism to allocate local buffers, 
because it allows us to vary the local size without recompiling the Kernel.  

Gary

Original comment by frost.g...@gmail.com on 16 Feb 2013 at 12:46

GoogleCodeExporter commented 8 years ago
Oh, I see. You pass a local memory pointer as a way of creating a variable-size 
local memory array.

It is also possible to declare constant-size arrays and scalars. In OpenCL 
they're method-scoped variables. In fact, I would say they are more common than 
variable-sized arrays (since fixed size or scalar is simpler than variable).

I think you should support them, for the sake of consistency. I am not sure how 
you are generating the OpenCL or the Java fallback, but I expect you have 
enough flexibility. (For Java fallback, I would make @Local variables 
non-static fields and @Global variables static. I would then make one object 
per work group, rewrite work-items as for() loops in the object's run() method, 
and make method-scoped @Local variables into the non-static fields.)

Original comment by adubin...@almson.net on 16 Feb 2013 at 1:38