Closed metabench closed 9 years ago
Thanks for feedback.
I have just released 0.9.10 with an example: https://github.com/unbornchikken/NOOOCL#examples
If you have time to write some others, please PR'em into the examples directory with a small readme file. Thanks a lot!
Many thanks. That example will be a great help.
I've just run the example, and it looks like I'm getting a similar problem to what I encountered before.
I get this output:
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecadd.js
Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\James\AppData\Local\Temp\OCL948T5.cl", line 1: warning: OpenCL
extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
Build completed.
Launching the kernel.
Waiting for result.
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>
It's not reaching the callback method where it logs the result to the console. Here it's the enqueueReadBuffer
callback not being called.
I happen to be able to test it on a Pitcairn at my work later. At my side it's work with a Barts and on the CPU. Till please try to delete node_modules folder, then do an npm install
then retest. Anyway which version of iojs are you using? This seems more than an ffi/libuv issue than an NOOOCL one coz you got a deadlock here.
If nothing helps, please try it on the CPU by commenting out this line: https://github.com/unbornchikken/NOOOCL/blob/master/examples/vector-addition/vecAdd.js#L41
Wouldn't an npm update nooocl
be OK?
Nope. Try a full install plz.
I deleted the nooocl
directory from node_modules
and then reinstalled it. It would be inconvenient to delete the whole node_modules
folder and I'll only do it if I know it's necessary.
Still the same result.
Because if you upgrade io.js regularly but forgot to reinstall all of the native modules in your applications then weird things can happen. Like this one.
I'm using iojs 1.6.2.
Ok, I'll try to reproduce it later today with Pitcairn on Windows x64 io.js 1.6.2. I hope that I'll get this too.
Till please try it with CPU fallback. If you'll still experience it, then it have to an ffi issue. I'm one of the recent ffi module contributors, so I will able to fix it if that is the case ASAP.
I need your help with this please. I was able to run the example with Pitcairn but the machine where it is available runs Linux Mint 17.1, and I'm not experiencing this issue on it. I've also tested it on Windows with an Intel chip and the example ran flawlessly. Both cases went on io.js 1.6.2.
I need you to try two things:
echo %ERRORLEVEL%
then please put the result there, it's gonna be a number.Thanks a lot!
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecAdd.js
Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\James\AppData\Local\Temp\OCL1088T5.cl", line 1: warning: OpenCL
extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
Build completed.
Launching the kernel.
Waiting for result.
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>echo %ERRORLEVEL%
-1073741819
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>
Thank you
That's an Access Violation thrown from somewhere in the ffi module. Would you try it with cpu fallback?
I just commented out that line so it tries to run on the CPU. Unfortunately it does not get as far this time:
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>iojs vecAdd.js
No GPU devices has been found, searching for a CPU fallback.
Running on device: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz - Intel(R) OpenCL
Building ...
(Everything after this point is asynchronous.)
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>
May I ask for what is the result of echo %ERRORLEVEL% for the second one?
After running that one:
C:\Users\James\Dropbox\metabench\metabench\jsgui\apps\ocl>echo %ERRORLEVEL%
0
Finally I'm able to reproduce this on Windows with an NVidia 630M. Calling clSetEventCallback crashes the process with an access violation on this platform. This needs a deep investigation though.
Very interesting. Thanks.
I have investigated the issue. With Node 0.10.3x, ffi 1.2.7, i get:
Running on device: Pitcairn - AMD Accelerated Parallel Processing
Building ...
(Everything after this point is asynchronous.)
"C:\Users\Gabor\AppData\Local\Temp\OCL4000T5.cl", line 1: warning: OpenCL
extension is now part of core
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
^
Build completed.
Launching the kernel.
Waiting for result.
Final result: 1
So it's an ffi module vs Windows issue definitely. It has been recently upgraded to support io.js and Node.js 0.12 but it seems that was not that success.
I'm planning to switch from ffi bindings to native https://github.com/mikeseven/node-opencl based bindings.
It will be interesting to see how this project develops. Are you planning for this project to cover much the same ground as node-opencl and use some similar techniques, but be your own implementation?
I'm working on my own system of running OpenCL through iojs at the moment. Maybe it's worth me sharing it on Github soon.
NOOOCL is a high level, object oriented library, that way mutch simpler to use than the native C api. Currently I'm using ffi module to access the C api. Node-opencl will provide the same low level api with the same interfaces. Essentially thats only lic/cl11.js and lib/cl12.js files, NOOOCL is far more than that.
I took a look at node-opencl. Right now it have issues of its own, it's in early beta state, it doesnt even get published to npm yet. Despite in long term Im willing to switch to it, right now the only option is to stay with ffi. In the easter holyday Im trying to get some time and compile an io.js and ffi debug build for windows, and go after this issue with visual c++ debugging tools.
Good news, I have successfully identified the source of this. Actually it's not because of a bug left in the ffi module, libuv the node event loop handling library has a strange bug that caused our issue here. I managed to implement a workaround for this into the ffi package, and will propose a PR today. I'm gonna publish the fix to my ffi fork (ffi-io on the npm), until my PR gets merged, it will do for NOOOCL. I have to go now, but I'm gonna release a fix for NOOOCL later today. Thanks for patience.
I may have seemed patient, but in fact I've not been patient at all. I've been coding the OpenCL calls from a C++ addon. I tried various options to make it easier, including yours, but none of them worked for me. When I was able to run that Oak Ridge example I based my code on that.
I've made a nice and fairly simple VM that lets the user set up OpenCL kernels and buffers from JavaScript, and execute them.
I've been having more ideas about writing fast code easily, and I'm thinking that some compilation tools to compile to OpenCL would be really useful. In particular, I'm thinking that compiling a superset of TypeScript would be nice (superset that supports the variety of numeric data types).
You may be interested in the OpenCL experiments I've done, published at https://github.com/metabench/opencljs-experiments
I've not been using any other node libraries to connect with OpenCL (apart from NAN for helping with the C++ bindings).
I'm not really planning on making the whole of OpenCL available, but at this stage the goal is to provide convenient means to run code quickly on the GPU.
I noticed that with the vector addition OpenCL code, only one line was actually doing the addition. I'm thinking about how OpenCL C, or code that compiles to it could be written inline with JavaScript code.
At the moment, I'm looking into what pre-existing conventions exist, and what conventions would make it fast to port from JavaScript code to this format.
Also, some JavaScript code looks very much like C, in isolation from other code. The code that actually does a vector addition, c[id] = a[id] + b[id];
, could even be JavaScript.
0.9.12 has been published, that should fix this. Please try.
I think coding GPGPU will never gonna be easy. Take an obvious example: there are given 5000 numbers, try to calculate they average on the GPU fast. You are never gonna succeed because of hardware restrictions, there is no way to synchronize (efficiently) across GPU cores, so there is no way to calculate one average, you'll get groups of averages.
I get what you wanna achieve. There are some excellent solutions like this but not for JavaScript, because its dynamic nature contradicts with C/C++ syntax and schematics of kernel languages.
CUDAify https://cudafy.codeplex.com/ : there you can write GPU kernels in C# language. C++ AMP https://msdn.microsoft.com/en-us/library/hh265136.aspx : there you can write (and profile and even debug!) GPU kernels in C++11 language.
I can see that you can handle yourself well with C++, so I think C++ AMP is that what you seek for. That's an awesome bit of technology for sure. Its only drawbacks compared to OpenCL is the reduced performance but you always pay with this for this kinda syntactic sugars.
Regarding NOOOCL. Please don't judge a module when it is in its early beta stage. All of the node native addon community suffered from the recent V8/libuv backward incompatible changes. But those issues gets addressed for sure, like this one.
I've made an other example for you, that shows why I used promises instead of callbacks. With ES6 generators and promises your asynchronous code will look exactly like synchronous, and you get the same control flow constructs asynchronously (like for, if, while, etc).
Yes, averaging numbers sounds more difficult in some ways. I'm sure it's solvable, but at what complexity cost?
The way I'm think of uses a for loop within the OpenCL loop to calculate averages for subsections, then those will be averaged. Although conceptually solving this on a GPU is more complicated than sequentially, I also think that complexity can be expressed in a simpler way then it is at the moment.
C++ AMP looks nice in theory, but I don't think it's open-source and cross-platform. Maybe that could change sometime (soon). Maybe I spoke to soon - I just saw this and it looks very interesting: http://www.amd.com/en-us/press-releases/Pages/developer-language-2014aug26.aspx
Me being able to handle myself with C++ is a relatively recent phenomenon, I'm pleased it's taking place. I don't like C++ that much in terms of syntax. I think some C++ extracts or functions can be very readable, and also look similar in lots of other languages and are easy to port between languages.
I have not drawn much of a judgement of NOOOCL apart from thinking it's got lots of potential. Looking at the source code, it covers a lot of the OpenCL API, providing ways of calling it asynchronously from JavaScript. I'm very interested in setting up my functions in C++ to work asynchronously and plan on using libuv for that. I'm not sure at this stage how much would need to be done with asynchronous callbacks in particular places or how much it will help, but I'll investigate further.
Writing asynchronous ES6 code like you have described sounds very good.
The OpenCL language looks like it can support a fair few things, with simple code, except lots of the code around it looks complex. It looks like there are a fair few OO features in OpenCL, including V 2.0, but I'm not quite sure what's in what version or even how to choose which version to run.
Now I'm expressing the kernel with some quite simple code that fits in reasonably well with JavaScript to add numbers:
var k2s = write_kernel('vecAdd', [['a', Float32Array], ['b', Float32Array]], ['res', Float32Array], `
res[id] = a[id] + b[id];
`);
I think we should continue this otherwise interesting discussion in an other issue thread, not this one. This one is about the crash. I'm wondering about that have you get the change to try the fix that I've released before?
OK, I understand.
No, I've not tried it yet. I'll try it soon, but am busy right now with my OpenCL system and am going out fairly soon.
I verified this on every machine on that I got this crash before. It seems with the new version the crash is gone.
While the particle simulator sounds like an interesting project, it's not best suited to getting a quick understanding of how to use nooocl.
Examples such as
would be fast (for you) to make, and would be a good illustration of how to run OpenCL with NOOOCL.
Having at least one such complete example in the readme would be useful.
If you do one example, I could then attempt the other ones. Then you could look at them to see if there are different ways of doing the same things, using different parts of the NOOOCL / OpenCL interface.