ironted / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Idea: Object of arrays #128

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi,

I'd like to get Aparapi to a more object oriented programming style. Currently, 
you can create an array of java objects, which is pretty slow, as each object 
has to be mapped to a C++ struct.

The idea is to have a class annotated by @InlineClass.

Whenever this annotation is encountered, the class content is inlined into the 
kernel class, including fields and methods.

Now, whereas the idea is pretty straight forwards, the implementation is 
definitely not that easy. I now spent half a day thinking of how this could be 
implemented.

In effect, I think the only working solution would be to merge the bytecode 
instructions from both classes. Unfortunately, this is also not easy, as the 
constant pool and all instructions referencing its entries have to be updated.

I think this is the only solution, as the kernel writing is based on 
instruction.

Now I am wondering how to merge ClassModels best. Is there any chance to do 
that? Or is it a hopeless endeavour? I mean, this is about code transformation 
for a whole class....

Do you have any other ideas or thoughts on that topic?

Matthias

Original issue reported on code.google.com by matthias.klass@gmail.com on 16 Aug 2013 at 1:41

GoogleCodeExporter commented 8 years ago
The whole array of objects solution is tricky (as you probably know)because we 
basically have to marshall/serialize arrays of objects to one contiguous block 
of memory, move to the GPU, then marshall/serialize the mutated objects back.

We do only copy the accessed fields from the accessed objects which helps 
minimize the amount of data we move. 

The IBM Java folks are proposing and annotation called 'PackedObjects' which 
would apply to Array and Object creation and helps denote alignment and padding.

I must confess, that this all seems non-performant.  I have been working on the 
lambda/HSA branch and it is a pure joy, to allow the GPU to just follow 
pointers, just like the CPU does.  So we just pass a pointer to an array of 
objects and the GPU follows the pointers. There are still challenges with 
virtual methods (esp when a new derived  class gets loaded - normally the JVM 
JIT handles this by recompilong all methods which might be effected).

Gary

Original comment by frost.g...@gmail.com on 16 Aug 2013 at 6:55

GoogleCodeExporter commented 8 years ago
Hm I now went for some different approach. The @InlineClass annotation is 
evaluated after bytecode parsing and afterwards explicitly handled in 
KernelWriter. Not that good looking, but it seems to work. I'll port my 
simulator and find out whether it really works :-)

In effect, this should be better than packing objects to a single memory unit, 
as we do not need to explicitly take care of every single object in the arrays, 
but can just move the array directly to the GPU. Consequently, the native part 
does not even need to know about the annotation, but is just given the 
reference no matter where the reference is actually placed.

HSAIL seems pretty nice I must confess. How far are you with that? I always 
thought that you need to make the HotSpot Compiler create proper HSAIL code. 
Does this code also have to be parsed by Aparapi afterwards? Or can you execute 
the code directly on the GPU? There must be some wrapper to tell which GPU 
actually has to be used ...

Matthias

Original comment by matthias.klass@gmail.com on 20 Aug 2013 at 9:58

GoogleCodeExporter commented 8 years ago
HSAIL is kind of like bytecode for GPU devices (or more correctly data parallel 
 style accelerator devices), for Sumatra (the OpenJDK project which I and a few 
other Aparapi comitters/contributors are working on) will indeed allow the JVM 
JIT compiler (Hotspot or Graal) to create HSAIL as a target, in the same way 
that the current JIT's can create x86/sparc/ARM ISA code. 

I am working in the branches/lambda tree to convert bytecode to HSAIL in much 
the same way that we convert from bytecode to OpenCL at present.  The HSAIL 
generation is comming along quite nicely, and on real hardware (AMD folk not 
surprisingly have early access to HSA enabled hardware) we get some good 
performance.  Most of this performance comes from not having to move data 
between the host memory and the GPU.  AMD calls thus hUMA (heterogeneous 
unified memory access), but I still refer to it as  'a pointer is a pointer'.  

This allows us to access Java heap objects directly on the GPU.  

Gary  

Original comment by frost.g...@gmail.com on 20 Aug 2013 at 2:30

GoogleCodeExporter commented 8 years ago
Just a couple of quick questions, apologize if I should know these answers:

1) Aren't future OpenCL releases planning to use HSA under-the-covers?
   1a) If yes, why target HSAIL directly?

2) Will HSA still work as described in #3 if you do not have an APU/HSA-enabled 
device?
   2a) If no, is Aparapi planning to have multiple possible execution paths?
   2b) If yes, will this pave the way for CUDA as well?

Original comment by pnnl.edg...@gmail.com on 20 Aug 2013 at 4:14

GoogleCodeExporter commented 8 years ago
Answers to questions. 

1 + 1a) 
   Whilst OpenCL may well be implemented on top of OpenCL.  The use of SVM (Shared Virtual Memory) is not planned until OpenCL 2.0 (https://www.khronos.org/news/press/khronos-releases-opencl-2.0) so we would still be required to move/copy blocks of memory to the GPU in Aparapi until SVM is generally available. 

2 + 2a ) 
   My understanding is that HSA features can only be expected from HSA compatible devices. So for non HSA devices we would need to use OpenCL.
   My thoughts are that we would extend the Aparapi execution framework to first look for HSA devices.  If HSA device exists and we can convert to HSAIl we target HSAIL.  If not (but we have OpenCL) we try OpenCL, if not we dispatch in a thread pool.  The good news is that the HSAIl restrictions are a lot smaller than the OpenCL restrictions ;) So I would say that if we can't code in HSAIL we can;t code in OpenCL. 

I choose to punt on 2b) :)

Gary   

Original comment by frost.g...@gmail.com on 20 Aug 2013 at 4:30