xiangyu / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Problem when trying to split kernels and execute some in JTP and others on GPU #137

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
   I modified the Aparapi code so that it passes the first half (arbitrarily picked half for testing) of the kernels and tried to run them on the GPU and at the same time run the other half in the JTP. I have done this by modifying the range passed to the GPU and JTP as half and modifying the Id values of the JTP so that they cover only the second half of the kernels. I have attached a link to the code below. 
   I have tried doing this several ways and it works for some small cases, and it always works if i run one and then the other, but when I run them at the same time I run into problems. In my kernel I have two consecutive lines
    c[gid] = sum;
    cCheck[gid] = 1;
and sometimes c has the correct value and cCheck has a 0. This happens 
regardless of the order i place these statements in the code. The number of 
errors and place of the error varies run to run. 
   Is there something in the GPU code that could be interfering when memory is copied to or from the GPU or does anyone have any idea what could be the problem?

Source: https://github.com/falbert9/AparapiSource

Original issue reported on code.google.com by falbe...@vt.edu on 16 Jan 2014 at 7:58

GoogleCodeExporter commented 9 years ago
So this is tricky.  I created a 'hybrid' device in the lambda/HSA branch for 
this.  It works there (but performance is nowhere near as good as I expected) 
because HSA has coherent global memory. 

I suspect that you could get this to work if you hacked the buffer transfer 
code.

The problem is that Aparapi is probably copying the results from the GPU (the 
whole data set) over the top of the data computed on the CPU.

So given  the canonical squares example 

int in[100];
int out[100];
s = new Kernel(){
   public void run(){ out[getGlobalId()] = in[getGlobalId()]*in[getGlobalId()];}
}

Even if you call 

s.execute(50);

The whole of the in[] buffer is copied to the GPU, and the whole of the out[] 
buffer is copied out.

Aparapi cannot work out that the code is not using all the buffers, so it has 
to copy all. 

One could hack the explicit memory calls so that it partially copies data

s.setExplicit(true);
s.put(in, 50); // instead of e.put(in) where 50 means only copy 0..50 elements
s.execute(50);
s.get(out, 50); // as above. 

Gary

Original comment by frost.g...@gmail.com on 16 Jan 2014 at 9:26

GoogleCodeExporter commented 9 years ago
I understand how to hack the .put(in, 50) but the get is a little harder. I 
found the get method and traced it back to 

protected native int getJNI(long _jniContextHandle, Object _array);

in KernelRunnerJNI.java. From there I can find getJNI in 

/com.amd.aparapi.jni/include/com_amd_aparapi_internal_jni_KernelRunnerJNI.h

But that is just a header and not an implementation. I can find where the 
getJNI method is actually implemented. Is the source for it available or is it 
in com.amd.aparapi.jni/dist/libaparapi_x86_64.so?

Thanks,
Curt

Original comment by falbe...@vt.edu on 27 Jan 2014 at 3:12

GoogleCodeExporter commented 9 years ago
com.amd.aparapi.jni/src/cpp/runKernel/Aparapi has the JNI binding for the 
method you are looking at. 

Around #1339

Look for 
JNI_JAVA(jint, KernelRunnerJNI, getJNI)

I suspect that the token pasting macros we use to paste package names 
obfuscated this a little. 

https://code.google.com/p/aparapi/source/browse/trunk/com.amd.aparapi.jni/src/cp
p/runKernel/Aparapi.cpp

Original comment by frost.g...@gmail.com on 27 Jan 2014 at 4:12