jordan30001 / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Create a Kernel library of pre-configured kernels #33

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
One thing that I think would be extremely useful and valuable would be if 
Aparapi supplied a library of pre-configured and optimized kernels for end-user 
use. For example, it would be nice to have a library of kernels with 
functionality similar to the library of example code available for CUDA, except 
for OpenCL via Aparapi. This could also help to augment any documentation in 
the Wiki needed to explain each use case.

My main motivation behind this request is the fact that even though all of the 
Aparapi examples appear to use single class files with kernels defined as inner 
classes, in my experience most production use of Aparapi will define kernels in 
separate classes which are then instantiated and executed from somewhere in the 
application code. There have been a number of times it would have been nice if 
there was a pre-configured XYZ kernel to use instead of writing one from 
scratch (after investigating the necessary logic in OpenCL or CUDA).

Original issue reported on code.google.com by ryan.lam...@gmail.com on 23 Jan 2012 at 2:10

GoogleCodeExporter commented 9 years ago
This would be a great thing to have - perhaps even an APARAPI wiki of sorts 
where users could submit kernels (and others improve them) for specific 
functionality? It'd be great if for relatively common tasks people could 
download a Java kernel rather than have to work around the limitations of 
OpenCL.

Original comment by berry...@gmail.com on 6 Feb 2012 at 8:19

GoogleCodeExporter commented 9 years ago
Can you take a quick look at 
http://code.google.com/p/aparapi/wiki/AparapiExtensionProposal to see if it 
would help?

Scroll down to the end because Witold had a suggestion for a cleaner solution 
(than my original proposal).  I plan on taking a look at this (after I check in 
the multi-dim range and local memory branches). 

So any feedback would be welcome. 

Original comment by frost.g...@gmail.com on 6 Feb 2012 at 10:17

GoogleCodeExporter commented 9 years ago

Original comment by frost.g...@gmail.com on 14 Feb 2012 at 5:30

GoogleCodeExporter commented 9 years ago
I'd got out of touch with Java (and programming in general) but got to looking 
at Aparapi through the AMD developer resources just out of interest (great 
presentation there Gary!) When the suggestion occurred to me of cramming some 
awesome compute out of crappy old Java - on an OC'd HD5450 & FX8120 - add in 
the excuse that it would be some interesting practice, and it was all I needed 
to get the bug back.

So disclaimer in place, FWIW: 'End-user' goodies like this could well be 
wrapped up into a package of 'accessible' objects ("Wraparapi" would be my 
suggestion for the perfect name BTW), and is probably well suited to a 
community project. I can well imagine an eclipse or netbeans plugin for apps 
which execute dynamically-prepared kernels, driven entirely through any one of 
the various text-based modelling/execution environments available.

With only 2x64 concurrent range indexing options on the HD5450 (discovered in 
my "demo-fail" output, then confirmed via AMD OpenCL docs), a pseudo wavefront 
scheduler to optimize groups depending on the hardware seems an obvious first 
candidate for wrapping. 

The difficulties in providing 'familiar to the average programmer' Java has 
been discussed elsewhere (I've been paying attention), but I think that many of 
those difficulties can be resolved without too much more on top. My suspicion 
is that only a few additional OpenCL calls, particularly those which provide 
info useful for optimal kernel/buffer configuration at runtime, might be all 
that's actually needed to be able to encapsulate a reasonably generic baseline 
to cover this kind of pre-configuration utility. 

Since there's nothing anyone can do, performance wise, about how data is 
fetched to aparapi, ideally the only thing that would need managing is memory 
buffer exposure, which might then be tackled through a code-contract. The real 
challenging side-job for any interface generalization will be to implement 
efficient data interpretation and conversion into and out of buffers; 
especially where data (and even operations??!!) are coming from an external 
stream. 

So I'm following with interest and experimenting like a loon (yeah, thanks for 
the heavy pizza nights and even heavier coffee mornings BTW). Having focused on 
some interesting math problems worth trying, if I put together anything worth 
poking a stick at I'll drop it up.

Original comment by earl.bra...@gmail.com on 9 Nov 2012 at 2:31

GoogleCodeExporter commented 9 years ago
Thanks for getting in touch and for taking a stab at this.  When I present at 
JUGs (off to Maryland and Washington DC JUGS this week) I always get asked 
about the availability for libraries, so I do think this would be very well 
received.  Let me know if you have any questions. 

You might consider looking at the new(ish) feature that allows you to wrap 
canned OpenCL code (see trunk extensions mandel example), it may help you gain 
more control over device. 

On the eclipse/netbeans front I have often wanted to add a 'refactoring' which 
allows me to highlight a nested loop and automatically create a Kernel class 
and insert the dispatch code.  That would be fun ;) 

And once again, thanks for getting involved. 

Gary

Original comment by frost.g...@gmail.com on 9 Nov 2012 at 9:24