unbornchikken / NOOOCL

Node.js Object Oriented OpenCL Bindings
MIT License
81 stars 9 forks source link

Example gets stuck on Windows / io.js / Radeon R9 #2

Closed metabench closed 9 years ago

metabench commented 9 years ago

While the particle simulator sounds like an interesting project, it's not best suited to getting a quick understanding of how to use nooocl.

Examples such as

would be fast (for you) to make, and would be a good illustration of how to run OpenCL with NOOOCL.

Having at least one such complete example in the readme would be useful.

If you do one example, I could then attempt the other ones. Then you could look at them to see if there are different ways of doing the same things, using different parts of the NOOOCL / OpenCL interface.

unbornchikken commented 9 years ago

Thanks for feedback.

I have just released 0.9.10 with an example: https://github.com/unbornchikken/NOOOCL#examples

If you have time to write some others, please PR'em into the examples directory with a small readme file. Thanks a lot!

metabench commented 9 years ago

Many thanks. That example will be a great help.

metabench commented 9 years ago

I've just run the example, and it looks like I'm getting a similar problem to what I encountered before.

I get this output:

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecadd.js
Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\James\AppData\Local\Temp\OCL948T5.cl", line 1: warning: OpenCL
          extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^
Build completed.
Launching the kernel.
Waiting for result.

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>

It's not reaching the callback method where it logs the result to the console. Here it's the enqueueReadBuffer callback not being called.

unbornchikken commented 9 years ago

I happen to be able to test it on a Pitcairn at my work later. At my side it's work with a Barts and on the CPU. Till please try to delete node_modules folder, then do an npm install then retest. Anyway which version of iojs are you using? This seems more than an ffi/libuv issue than an NOOOCL one coz you got a deadlock here.

If nothing helps, please try it on the CPU by commenting out this line: https://github.com/unbornchikken/NOOOCL/blob/master/examples/vector-addition/vecAdd.js#L41

metabench commented 9 years ago

Wouldn't an npm update nooocl be OK?

unbornchikken commented 9 years ago

Nope. Try a full install plz.

metabench commented 9 years ago

I deleted the nooocl directory from node_modules and then reinstalled it. It would be inconvenient to delete the whole node_modules folder and I'll only do it if I know it's necessary.

metabench commented 9 years ago

Still the same result.

unbornchikken commented 9 years ago

Because if you upgrade io.js regularly but forgot to reinstall all of the native modules in your applications then weird things can happen. Like this one.

metabench commented 9 years ago

I'm using iojs 1.6.2.

unbornchikken commented 9 years ago

Ok, I'll try to reproduce it later today with Pitcairn on Windows x64 io.js 1.6.2. I hope that I'll get this too.

unbornchikken commented 9 years ago

Till please try it with CPU fallback. If you'll still experience it, then it have to an ffi issue. I'm one of the recent ffi module contributors, so I will able to fix it if that is the case ASAP.

unbornchikken commented 9 years ago

I need your help with this please. I was able to run the example with Pitcairn but the machine where it is available runs Linux Mint 17.1, and I'm not experiencing this issue on it. I've also tested it on Windows with an Intel chip and the example ran flawlessly. Both cases went on io.js 1.6.2.

I need you to try two things:

Thanks a lot!

metabench commented 9 years ago
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecAdd.js
Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\James\AppData\Local\Temp\OCL1088T5.cl", line 1: warning: OpenCL
          extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^
Build completed.
Launching the kernel.
Waiting for result.

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>echo %ERRORLEVEL%
-1073741819

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>

Thank you

unbornchikken commented 9 years ago

That's an Access Violation thrown from somewhere in the ffi module. Would you try it with cpu fallback?

metabench commented 9 years ago

I just commented out that line so it tries to run on the CPU. Unfortunately it does not get as far this time:

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecAdd.js
No GPU devices has been found, searching for a CPU fallback.
Running on device: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz - Intel(R) OpenCL
Building ...
(Everything after this point is asynchronous.)

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>
unbornchikken commented 9 years ago

May I ask for what is the result of echo %ERRORLEVEL% for the second one?

metabench commented 9 years ago

After running that one:

C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>echo %ERRORLEVEL%
0
unbornchikken commented 9 years ago

Finally I'm able to reproduce this on Windows with an NVidia 630M. Calling clSetEventCallback crashes the process with an access violation on this platform. This needs a deep investigation though.

metabench commented 9 years ago

Very interesting. Thanks.

unbornchikken commented 9 years ago

I have investigated the issue. With Node 0.10.3x, ffi 1.2.7, i get:

Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\Gabor\AppData\Local\Temp\OCL4000T5.cl", line 1: warning: OpenCL
          extension is now part of core
  #pragma OPENCL EXTENSION cl_khr_fp64 : enable
                           ^
Build completed.
Launching the kernel.
Waiting for result.
Final result: 1

So it's an ffi module vs Windows issue definitely. It has been recently upgraded to support io.js and Node.js 0.12 but it seems that was not that success.

I'm planning to switch from ffi bindings to native https://github.com/mikeseven/node-opencl based bindings.

metabench commented 9 years ago

It will be interesting to see how this project develops. Are you planning for this project to cover much the same ground as node-opencl and use some similar techniques, but be your own implementation?

I'm working on my own system of running OpenCL through iojs at the moment. Maybe it's worth me sharing it on Github soon.

unbornchikken commented 9 years ago

NOOOCL is a high level, object oriented library, that way mutch simpler to use than the native C api. Currently I'm using ffi module to access the C api. Node-opencl will provide the same low level api with the same interfaces. Essentially thats only lic/cl11.js and lib/cl12.js files, NOOOCL is far more than that.

unbornchikken commented 9 years ago

I took a look at node-opencl. Right now it have issues of its own, it's in early beta state, it doesnt even get published to npm yet. Despite in long term Im willing to switch to it, right now the only option is to stay with ffi. In the easter holyday Im trying to get some time and compile an io.js and ffi debug build for windows, and go after this issue with visual c++ debugging tools.

unbornchikken commented 9 years ago

Good news, I have successfully identified the source of this. Actually it's not because of a bug left in the ffi module, libuv the node event loop handling library has a strange bug that caused our issue here. I managed to implement a workaround for this into the ffi package, and will propose a PR today. I'm gonna publish the fix to my ffi fork (ffi-io on the npm), until my PR gets merged, it will do for NOOOCL. I have to go now, but I'm gonna release a fix for NOOOCL later today. Thanks for patience.

metabench commented 9 years ago

I may have seemed patient, but in fact I've not been patient at all. I've been coding the OpenCL calls from a C++ addon. I tried various options to make it easier, including yours, but none of them worked for me. When I was able to run that Oak Ridge example I based my code on that.

I've made a nice and fairly simple VM that lets the user set up OpenCL kernels and buffers from JavaScript, and execute them.

I've been having more ideas about writing fast code easily, and I'm thinking that some compilation tools to compile to OpenCL would be really useful. In particular, I'm thinking that compiling a superset of TypeScript would be nice (superset that supports the variety of numeric data types).

metabench commented 9 years ago

You may be interested in the OpenCL experiments I've done, published at https://github.com/metabench/opencljs-experiments

I've not been using any other node libraries to connect with OpenCL (apart from NAN for helping with the C++ bindings).

I'm not really planning on making the whole of OpenCL available, but at this stage the goal is to provide convenient means to run code quickly on the GPU.

I noticed that with the vector addition OpenCL code, only one line was actually doing the addition. I'm thinking about how OpenCL C, or code that compiles to it could be written inline with JavaScript code.

At the moment, I'm looking into what pre-existing conventions exist, and what conventions would make it fast to port from JavaScript code to this format.

Also, some JavaScript code looks very much like C, in isolation from other code. The code that actually does a vector addition, c[id] = a[id] + b[id];, could even be JavaScript.

unbornchikken commented 9 years ago

0.9.12 has been published, that should fix this. Please try.

I think coding GPGPU will never gonna be easy. Take an obvious example: there are given 5000 numbers, try to calculate they average on the GPU fast. You are never gonna succeed because of hardware restrictions, there is no way to synchronize (efficiently) across GPU cores, so there is no way to calculate one average, you'll get groups of averages.

I get what you wanna achieve. There are some excellent solutions like this but not for JavaScript, because its dynamic nature contradicts with C/C++ syntax and schematics of kernel languages.

CUDAify https://cudafy.codeplex.com/ : there you can write GPU kernels in C# language. C++ AMP https://msdn.microsoft.com/en-us/library/hh265136.aspx : there you can write (and profile and even debug!) GPU kernels in C++11 language.

I can see that you can handle yourself well with C++, so I think C++ AMP is that what you seek for. That's an awesome bit of technology for sure. Its only drawbacks compared to OpenCL is the reduced performance but you always pay with this for this kinda syntactic sugars.

Regarding NOOOCL. Please don't judge a module when it is in its early beta stage. All of the node native addon community suffered from the recent V8/libuv backward incompatible changes. But those issues gets addressed for sure, like this one.

I've made an other example for you, that shows why I used promises instead of callbacks. With ES6 generators and promises your asynchronous code will look exactly like synchronous, and you get the same control flow constructs asynchronously (like for, if, while, etc).

metabench commented 9 years ago

Yes, averaging numbers sounds more difficult in some ways. I'm sure it's solvable, but at what complexity cost?

The way I'm think of uses a for loop within the OpenCL loop to calculate averages for subsections, then those will be averaged. Although conceptually solving this on a GPU is more complicated than sequentially, I also think that complexity can be expressed in a simpler way then it is at the moment.

C++ AMP looks nice in theory, but I don't think it's open-source and cross-platform. Maybe that could change sometime (soon). Maybe I spoke to soon - I just saw this and it looks very interesting: http://www.amd.com/en-us/press-releases/Pages/developer-language-2014aug26.aspx

Me being able to handle myself with C++ is a relatively recent phenomenon, I'm pleased it's taking place. I don't like C++ that much in terms of syntax. I think some C++ extracts or functions can be very readable, and also look similar in lots of other languages and are easy to port between languages.

I have not drawn much of a judgement of NOOOCL apart from thinking it's got lots of potential. Looking at the source code, it covers a lot of the OpenCL API, providing ways of calling it asynchronously from JavaScript. I'm very interested in setting up my functions in C++ to work asynchronously and plan on using libuv for that. I'm not sure at this stage how much would need to be done with asynchronous callbacks in particular places or how much it will help, but I'll investigate further.

Writing asynchronous ES6 code like you have described sounds very good.

The OpenCL language looks like it can support a fair few things, with simple code, except lots of the code around it looks complex. It looks like there are a fair few OO features in OpenCL, including V 2.0, but I'm not quite sure what's in what version or even how to choose which version to run.

Now I'm expressing the kernel with some quite simple code that fits in reasonably well with JavaScript to add numbers:

var k2s = write_kernel('vecAdd', [['a', Float32Array], ['b', Float32Array]], ['res', Float32Array], `
  res[id] = a[id] + b[id];
`);
unbornchikken commented 9 years ago

I think we should continue this otherwise interesting discussion in an other issue thread, not this one. This one is about the crash. I'm wondering about that have you get the change to try the fix that I've released before?

metabench commented 9 years ago

OK, I understand.

No, I've not tried it yet. I'll try it soon, but am busy right now with my OpenCL system and am going out fairly soon.

unbornchikken commented 9 years ago

I verified this on every machine on that I got this crash before. It seems with the new version the crash is gone.