yaohuaxin / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Working with float[][] #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a small kernel that uses float[][]
2. Execute it

What is the expected output? What do you see instead?
Either an error message like "not supported" or a "FallBackWarning".
I see this message instead:

Failed to execute: null
java.lang.NullPointerException
    at com.amd.aparapi.KernelWriter.convertType(KernelWriter.java:111)
    at com.amd.aparapi.KernelWriter.write(KernelWriter.java:260)
    at com.amd.aparapi.KernelRunner.execute(KernelRunner.java:1197)
    at com.amd.aparapi.Kernel.execute(Kernel.java:1523)
    at com.amd.aparapi.Kernel.execute(Kernel.java:1469)

What version of the product are you using? On what operating system?
r89, Windows x86, Oracle JDK 1.6.0_20

Please provide any additional information below.
The problem is the field type parser. Because the field is float[][], the 
typename is "[[F".
The code extracts the first '[' and then assumes that the next char is the 
typename, but it is another '['.

It would be very nice if multidimensional arrays were supported. I do not need 
jagged arrays, so my solution is to rewrite the array as a single float[] and 
then just do index calculations manually. However this gives a lot of memory 
copy, because I copy outside Aparapi, then Aparapi copies to device, and the 
same on the way back.

I would be interested in having a go a implementing this for non-jagged arrays 
inside Aparapi, could you give me some hints as to where I should look in the 
code?

Original issue reported on code.google.com by kenneth@hexad.dk on 28 Oct 2011 at 8:20

GoogleCodeExporter commented 8 years ago
Ragged/Jagged arrays would be hard, but as you elude it might be possible to 
offer support for rectangular arrays. 

A few things to worry about - I am on vacation without source access - well I 
have access but painful to read from my phone ;)

So first we will need to allow the bytecode through, this probably just means 
we just remove the code check in MethodModel.

We also need a strategy for dealing with the fact that the array of arrays of 
primitive are not in one contiguous region. One option (slow) would be to copy 
the data to a single array.  This might allow a quick hack, but would be 
expensive.  Alternatively we could mimic the Java array of arrays in OpenCL 
land. This would be best I think. 

Of course with a single dimension array we need to pin the array to stop Java's 
GC from moving the array that we passed to the GPU, we generally don't need to 
worry about the array being collected because (we assume) that there is a live 
reference from the java code.  With a two dimensional array we would need to 
pin the array of arrays and then loop through all of the arrays and pin them.  
This is clearly possible, but it would require us to track the array 
information from the JNI side. Definately look in the aparapi.cpp code to see 
how we handle the pinning.

I would try this my creating test code and then just stepping through existing 
code and fixing code parsing (should be fine), bytecode folding (should be OK 
but you might need help with a transform - my guess is this bytecode creates 
lots of DUP instructions which require transforms - I would be happy to help 
with this), then code creation and JNI changes. 

It would be a fun project and I think would be dooable. 

Gary

Original comment by frost.g...@gmail.com on 28 Oct 2011 at 3:11

GoogleCodeExporter commented 8 years ago
We have experienced the same problem, using long[][] which results in a NPE. 
Our solution for now is to create a single long[] outside of Aparapi and manage 
the data transfer via explicit mode, but this is obviously non-ideal.

Original comment by ryan.lam...@gmail.com on 4 Nov 2011 at 7:00

GoogleCodeExporter commented 8 years ago
I have checked in changes to enable detection of accesses through 
multidimensional arrays.  This was not detected previously until we attempted 
to create OpenCL.  Now we should be able to detect these as we fold 
expressions. 

Added two junit codegen tests (Access2DIntArray and Assign2DIntArray)

Original comment by frost.g...@gmail.com on 7 Nov 2011 at 7:20

GoogleCodeExporter commented 8 years ago
Keeping this open until the enhancement itself is implemented.

Original comment by frost.g...@gmail.com on 7 Nov 2011 at 7:23

GoogleCodeExporter commented 8 years ago
Excellent, we look forward to testing it out

Original comment by ryan.lam...@gmail.com on 8 Nov 2011 at 5:01

GoogleCodeExporter commented 8 years ago
I have verified with the latest trunk code that double arrays are being 
detected during OpenCL creation.

In addition to the NPE that is still thrown as described above, if you try to 
save to a double array within your kernel a new exception is thrown:

com.amd.aparapi.ClassParseException: Can't assign to two dimension array

Is this ticket intended to support both use cases?

Original comment by ryan.lam...@gmail.com on 13 Nov 2011 at 6:40

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Just a thought, couldn't Aparapi just copy all of the double/triple/etc. arrays 
into a single array wrapped in an NIO buffer and then use the NIO buffer array 
with the GPU? I believe this is how frameworks like JNA deal with 
multi-dimensional arrays. Not ideal, but should work as expected.

Original comment by ryan.lam...@gmail.com on 14 Feb 2012 at 9:47

GoogleCodeExporter commented 8 years ago
We can do this.  Tom (in the discussion list) did this for 1D buffers
and this approach would work for 2D/3D/nD ;)  but sacrificed array
access syntax (arr[i]) for explicit method access arrBuf.get(i).

This approach is probably the most likely to give performance.

Anytime we take arr[][] or arr[][][] and consider copying behind the
scenes we have a lot of copying to do an a lot of JNI based array
accesses to do the copying.

But (as you indicated earlier) I really am only 'assuming' this would
be slower (based on #'s from previous JNI/array work) so it would be
interesting just to measure the cost.

Gary

Original comment by frost.g...@gmail.com on 14 Feb 2012 at 10:02

GoogleCodeExporter commented 8 years ago
We can do this.  Tom (in the discussion list) did this for 1D buffers
and this approach would work for 2D/3D/nD ;)  but sacrificed array
access syntax (arr[i]) for explicit method access arrBuf.get(i).

Tom's approach is probably the most likely to give performance, because we can 
guarantee contiguous memory accesses.

Anytime we take arr[][] or arr[][][] and consider copying behind the
scenes we have a lot of copying to do an a lot of JNI based array
accesses to do the copying.

But (as you indicated earlier) I really am only 'assuming' this would
be slower (based on #'s from previous JNI/array work) so it would be
interesting just to measure the cost.

If you had an example application that you would like to test maybe between us 
we can hack up a version of Aparapi which would do this. 

Gary

Original comment by frost.g...@gmail.com on 14 Feb 2012 at 10:04

GoogleCodeExporter commented 8 years ago
Here is some interesting example code for the following:

http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jnistring.html#
pin
http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jnistring.html#
multi

Those two examples together should be able to allow us to use arrays of arrays?

Original comment by ryan.lam...@gmail.com on 3 Mar 2012 at 4:33

GoogleCodeExporter commented 8 years ago
We have checked-in a branch that resolves multidimensional array support for 
1D, 2D and 3D arrays.

There is OpenMP support added to the source, but commented out at the moment. 
The OpenMP code has been tested successfully on Windows 7 and OS X.

This code specifically addresses our Java heap memory usage issues with forced 
1D array creation in Java code, when we have data that initially starts as 
multidimensional arrays and those arrays may be wrapped in other objects. This 
branch moves the creation and copying of Java arrays to native code, so heap 
memory pressure is significantly alleviated or eliminated.

Original comment by ryan.lam...@gmail.com on 9 Apr 2013 at 5:42

GoogleCodeExporter commented 8 years ago
In the current Trunk code, I am working on merging the multi-dimensional array 
branch. But I am also still experimenting with OpenMP. We have it working in 
the branch, but I'm debating if it is the most efficient means of using 
OpenMP...

I'm looking at ways to use AttachCurrentThread() inside of the OpenMP parallel 
for loops, do you have a suggestion (based on the current Trunk code) where the 
best place would be to cache a global pointed to JavaVM *jvm ?

Original comment by ryan.lam...@gmail.com on 15 Apr 2013 at 8:00

GoogleCodeExporter commented 8 years ago
I think it would be appropriate to cache this in each JNIContext.

I think it is fairly cheap to extract from JNIEnv using 

 jniEnv->GetJavaVM(JavaVM **vm);

For the upcoming 'lambda branch' where we have to run as an agent.  We can grab 
a copy when the agent is loaded. 

Original comment by frost.g...@gmail.com on 15 Apr 2013 at 8:54

GoogleCodeExporter commented 8 years ago
All relevant code has been merged from the MultiDimensionalArray Branch to 
Trunk. A few of the notable changes:

- There is a new AparapiBuffer class that is used for managing arrays

- OpenMP+JNI friendly code has been included but commented out until I can 
verify the most performant way to implement this feature (others are welcome to 
investigate this too)

- The arrays are limited to 1D, 2D and 3D to match current Range limitations 
and to avoid excessive new code to support n-dimensional arrays

- Array copying is now performed in C++ to avoid Java heap memory problems due 
to generation and copying of Java arrays multiple times in Java code required 
to generate 1D arrays needed exclusively for GPU usage

- There is a new project in /samples/mdarray that was used to comprehensively 
test the new multi-dimensional feature. While we are positive there will most 
likely be edge cases we didn't test, we should have covered most of the common 
use cases.

- Any and all input, code review and changes (if necessary) are welcome

Original comment by ryan.lam...@gmail.com on 19 Apr 2013 at 8:17

GoogleCodeExporter commented 8 years ago
It appears we missed kernel.put() and kernel.get() for multi-dimensional array 
support. Bummer.

Gary, do you have any time to add this missing piece? I'll try to take a look 
at it as soon as I can.

Original comment by ryan.lam...@gmail.com on 19 Apr 2013 at 8:43

GoogleCodeExporter commented 8 years ago
Okay, that may have been easier than I first thought. I just checked-in an 
initial pass at adding the missing method parameters. Will need more testing.

Original comment by ryan.lam...@gmail.com on 19 Apr 2013 at 8:51

GoogleCodeExporter commented 8 years ago
Nice work Ryan.

I just saw your ask for help with kernel.get()/put() then (almost
immediately) you had the fix.  Looks like you found it ;)

I will take the trunk for a spin this weekend.  Thank you (and Steven ) for
this effort.  I know I have been a 'slacko' on this.

I will test on Win64+Linux64+MacOS this weekend.

Original comment by frost.g...@gmail.com on 19 Apr 2013 at 9:04

GoogleCodeExporter commented 8 years ago
Ryan.  Compilation failed for me on (Win64)
 LINK : fatal error LNK1104: cannot open file 'VCOMP.lib'

Original comment by frost.g...@gmail.com on 19 Apr 2013 at 9:11

GoogleCodeExporter commented 8 years ago
So I think this is required for "Visual Studio" builds using OpenMP.

I looked around briefly, but do not have this.  I am using Visual Studio 
Express (always build with this in the past).

Where is you VComp.lib ?

Gary  

Original comment by frost.g...@gmail.com on 19 Apr 2013 at 9:14

GoogleCodeExporter commented 8 years ago
Hmm, I am using VS Professional 2012. VS Express...MS leaves out a lot of 
important capabilities in the Express edition, unfortunately. Is using VS 
Express a requirement for Aparapi?

It is a simple build.xml change to disable OpenMP support, I don't have it open 
right now but you can simply remove/comment out <arg value="/openmp" />.

If you can't get to it today, I can try and check in the change tonight.

Original comment by ryan.lam...@gmail.com on 19 Apr 2013 at 10:26

GoogleCodeExporter commented 8 years ago
I checked in the commented-out configuration(s) in build.xml. Let me know if 
you encounter any other issues.

Original comment by ryan.lam...@gmail.com on 20 Apr 2013 at 12:25

GoogleCodeExporter commented 8 years ago
Ryan. 

Worked like a charm on Linux 64.  I added a mdarray.sh.

Regarding your 'VS express question' (#21 above).  I won't say that VS Express 
is a requirement, but I also would hate to force developers to 'have to have' 
professional version of VS to contribute to Aparapi. 

I for example have VS Express at home, and that is where I tend to do my 
Aparapi builds.

What happens if we build against this Vcomp lib and create aparapi_x86_64.dll 
as part of a binary distribution.  Is the 'user' going to need to have a 
dynamic version of this?  Maybe my OpenMP naivity is showing here.

Can we try this.  Can you build aparapi_x86_64.dll for me (using your OpenMP 
'enabled' VS environment).  I will then run on a machine which only has 'basic' 
runtime support. I want to make sure there is nothing in aparapi_x86_64.dll 
from your env which will stop me from using it. 

I guess I could use http://www.dependencywalker.com/ to check this.  

Original comment by frost.g...@gmail.com on 21 Apr 2013 at 2:50

GoogleCodeExporter commented 8 years ago
Gary,

I can do that and attach the output to this ticket, hopefully tomorrow when I 
get back to work.

As far as developers building Aparapi is concerned, this is just a compiler 
directive change, therefore we could possibly have something as simple as "MSVC 
OpenMP" and "MSVC Non-OpenMP" Ant targets. This problem really only affects 
Windows as GCC has supported OpenMP since 4.2. Either way, the actual C++ code 
does not have to change, since the OpenMP pragmas are ignored by non-OpenMP 
enabled compilers.

http://gcc.gnu.org/wiki/openmp

Original comment by ryan.lam...@gmail.com on 21 Apr 2013 at 9:45

GoogleCodeExporter commented 8 years ago
It is possible users may need to install Visual C++ Redistributable for Visual 
Studio 2012 Update 1: 
http://www.microsoft.com/en-us/download/details.aspx?id=30679

"The Visual C++ Redistributable Packages install runtime components of Visual 
C++ libraries that are required to run applications developed using Visual 
Studio 2012 on a computer that does not have Visual Studio 2012 installed. 
These packages install runtime components of the C Runtime (CRT), Standard C++, 
ATL, MFC, C++ AMP, and OpenMP libraries."

Original comment by ryan.lam...@gmail.com on 23 Apr 2013 at 6:23

GoogleCodeExporter commented 8 years ago
I prefer building two versions aparapi_win_x86_64_openmp.zip and
aparapi_win_x86_64.zip and letting the user choose.

We could even put two versions of the dll in the same zip and finding some
way to select the appropriate one at runtime.

Gary

Original comment by frost.g...@gmail.com on 23 Apr 2013 at 6:29

GoogleCodeExporter commented 8 years ago
Gary,

I have attached a .dll that was built with Visual Studio 2012 Professional, 
Windows SDK 7.1A and OpenMP enabled. Let me know if you are able to execute it 
on your machine.

In the process, I have also cleaned-up the Trunk build.xml to query for SDKs 
properly when a machine has multiple SDKs installed :) I will check-in those 
changes shortly. Only affects Windows builds at this time.

Original comment by ryan.lam...@gmail.com on 30 Apr 2013 at 7:23

GoogleCodeExporter commented 8 years ago
Also, let me put together a DLL with all of the OpenMP pragmas uncommented in 
AparapiBuffer.cpp so we can test that as well.

Original comment by ryan.lam...@gmail.com on 30 Apr 2013 at 7:59

GoogleCodeExporter commented 8 years ago
Try this DLL instead

Original comment by ryan.lam...@gmail.com on 1 May 2013 at 1:02

Attachments:

GoogleCodeExporter commented 8 years ago
Any results Gary?

Original comment by ryan.lam...@gmail.com on 8 May 2013 at 5:18

GoogleCodeExporter commented 8 years ago
Can we mark this issue complete? Is the only task left to update the Wiki/FAQ?

Original comment by pnnl.edg...@gmail.com on 1 Jun 2013 at 8:25

GoogleCodeExporter commented 8 years ago
Hi,

This is very nice.

When using Multi-Dimension Array in "run()" method, the program crashes by 
calling "execute(N*N)" method twice or more.
Is there any solution in order to call "execute()" method multiple times?

Thanks.

Original comment by fukusui...@gmail.com on 9 Jun 2013 at 3:57

GoogleCodeExporter commented 8 years ago
Do you have a code example?

Original comment by pnnl.edg...@gmail.com on 9 Jun 2013 at 10:54

GoogleCodeExporter commented 8 years ago
Thank you for your response.

Simply I call "execute(N*N)" twice in your sample code "mdarray".
Like this.

   public static void Zrun2D() {
      ....
      long gs = System.currentTimeMillis();
      final Kernel kernel = new ZMatMul2D(A, B, gpu, N);

      kernel.execute(N * N);
      //Do something.
      kernel.execute(N * N);

      gs = System.currentTimeMillis() - gs;
      ....
   }

Original comment by fukusui...@gmail.com on 10 Jun 2013 at 12:17

GoogleCodeExporter commented 8 years ago
Gary,

I think the only task left, besides investigating Licht Takeuchi's potential 
bug post, is to let me know if the OpenMP version works for you on Windows 7 
64-bit.

-Ryan

Original comment by pnnl.edg...@gmail.com on 12 Jul 2013 at 10:23

GoogleCodeExporter commented 8 years ago
I am sorry I have been tardy on this.  Yes i can confirm that the OpenMP 
version does work (well does not fail!) for me on Win 7. 

Sorry Ryan.

I think we can close this as done!

Gary

Original comment by frost.g...@gmail.com on 12 Jul 2013 at 11:29

GoogleCodeExporter commented 8 years ago
No problem, I thought you had maybe forgotten about this or I had pestered you 
too much regarding the OpenMP testing.

The code that is currently checked into the respository is NOT OpenMP enabled. 
The OpenMP code and configuration is currently commented-out. The code on my 
local machine is fully OpenMP enabled.

But, there are still some questions I have that I need your input:

- OpenMP does not necessarily speed up loops unless they are large...greater 
than a million iterations or taking longer than 15ms, for example.

http://msdn.microsoft.com/en-us/library/fw509c3b.aspx

"Assuming an x64, single core, dual processor the threadpool takes about 16ms 
to startup. After that though there is very little cost for the threadpool.

When you compile with /openmp, the second call to test2 never runs any longer 
than if you compile with /openmp-, as there is no threadpool startup. At a 
million iterations the /openmp version is faster than the /openmp- version for 
the second call to test2, and at 25 iterations both /openmp- and /openmp 
versions register less than the clock granularity.

So if you have only one loop in your application and it runs in less than 15ms 
(adjusted for the approximate overhead on your machine), /openmp may not be 
appropriate, but if it's anything more than that, you may want to consider 
using /openmp."

- If we do want to enable OpenMP for large loops (I believe we'd have to test 
all 3 dimensions to see if any of them qualifies or qualify collectively) and 
disable it for smaller loops, then I/we will need to add code to the JNI 
looping sections to enable this check.

Original comment by pnnl.edg...@gmail.com on 14 Jul 2013 at 10:55

GoogleCodeExporter commented 8 years ago
We can also exercise some of this functionality to test for OpenMP as well:

http://msdn.microsoft.com/en-us/magazine/cc163717.aspx

"When the /openmp switch is invoked, the compiler defines the symbol _OPENMP, 
which may be used to detect when OpenMP is enabled using #ifndef _OPENMP."

I believe this is also consistent with how GNU GCC works as well.

Original comment by pnnl.edg...@gmail.com on 14 Jul 2013 at 11:50