Open GoogleCodeExporter opened 9 years ago
I know this isn't a defect, but I can't figure out how to change it
Original comment by falbe...@vt.edu
on 23 Oct 2013 at 7:01
Thanks for raising this. I changed it to a an enhancement request, rather than
a bug.
Actually this will be hard to fully implement in Aparapi. As you indicate,
this would need a modified JVM, and of course Aparapi tries to be JVM agnostic
and does not have access to JVM internals (except as exposed via
JVMTI/JNI/Unsafe).
I think this is a great idea, but one that I suspect may need to be looked at
in the context of 'Project Sumatra', which is aiming at bringing GPU
compute/offload to the JVM itself - and as such is indeed a modified JVM.
http://openjdk.java.net/projects/sumatra/
http://www.extremetech.com/computing/137628-project-sumatra-improves-java-perfor
mance-with-opencl-graphics-card-acceleration
I would be interested in a use case (no matter how trivial) and en example of
how you would like to see (for example) data auto-vettorized.
Gary
Original comment by frost.g...@gmail.com
on 24 Oct 2013 at 1:46
I was looking at using Oracle's Maxine JVM to modify to actually produce the
vectorized assembly code but was going to use the Aparapi interface to justify
that the variables are independent and vectorizable since they must be to be
split up across a gpu. The part of Aparapi that I am looking to modify would be
to preform the loop unroll and mark the code to tell the modified JVM to
produce vectorized instructions for those variables if possible. Which part of
the Aparapi code should I look into to preform this or should I just implement
this separately and run it on the code before Aparapi?
Original comment by falbe...@vt.edu
on 24 Oct 2013 at 7:33
For a basic example say there is a for loop
for(int i=0; i < 64; i++)
{
a[i] = b[i] * c[i];
}
could be vectorized to
for(int i=0; i < 32; i++)
{
a[i&i+1] = b[i&i+1] * c[i&i+1];
}
There are some benchmarks where the GPU is faster than the JTP and this
obviously wouldn't apply to them but when JTP is faster due to the overhead of
copying data to and form the GPU my thought is that this would speed up the JTP
execution just like vectorization does in C.
Original comment by falbe...@vt.edu
on 24 Oct 2013 at 7:38
Ah I see.
So you might want to look at using Graal for this.
https://wikis.oracle.com/display/MaxineVM/MaxineGraal
Graal (derived from Maxine) as far as I can tell is also the engine we are
using in Sumatra to generate HSAIL code for GPU offload. The idea of
vectorizing via Graal is not new apparently (although was new to me :) )
http://www.boston-technology.com/blog/what-does-graal-mean-for-java/
BTW having played with Graal for a while now, I am considering using it for GPU
code generation for Aparapi. It supports interesting optimizations and is
great at inlining.
I think this is a great idea for a project.
Gary
Original comment by frost.g...@gmail.com
on 24 Oct 2013 at 10:02
Original issue reported on code.google.com by
falbe...@vt.edu
on 23 Oct 2013 at 7:00