gpujs / gpu.js

GPU Accelerated JavaScript
https://gpu.rocks
MIT License
15.15k stars 657 forks source link

Compute shaders support? #425

Open evdem opened 5 years ago

evdem commented 5 years ago

Experimental WebGL2 Compute shaders are supported in Chrome Canary. See e.g. my pages https://www.ibiblio.org/e-notes/webgl/gpu/contents.htm. Is it possible to add WebGL2-compute into gpu.js? Couldn't you help (guide) me how to add them? (I get refuse from WebDNN :)

By the way are HALF_FLOAT data supported in gpu.js? Simple test at https://www.ibiblio.org/e-notes/webgl/gpu/tex2_16.htm

Evgeny Demidov

robertleeplummerjr commented 5 years ago

Can you show a scenario where it'd be helpful? I'm sure we could support it.

evdem commented 5 years ago

as for HALF_FLOAT, then f16 gives x2 acceleration (for free and very useful for AI, ML they say) on modern GPU Intel (SkyLake is tested in OpenCL), mobile, Nvidia (Turing in Cuda). Both throughput (mediump precision) and bandwidth (HALF_FLOAT).

as for compute shaders see https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm.htm. Cedric Nugteren (in OpenCL very similar to compute shaders) get x5 acceleration for script with shared memory wit respect to naive one. I get only 50% in experimental WebGL2-compute yet. I can compare WebGL and WebDL2-compute (in a while).

jacobbogers commented 5 years ago

they are not called compute shaders its a render extention (so not part of standard) to the webgl2, but its only lives in the fragment shader, I see from the gpu.js most of the heavy duty stuff is done in the vertex shader, i dont know all of the code so maybe the latter is not true anymore. for ML training I would still use 32 bit floating point, once your model has been trained, you can execute the model with 16 bit floats, thats how it is done.

robertleeplummerjr commented 5 years ago

I think vertex shaders are pretty light actually:

However, the bulk of code is in fragment shaders:

NOTE: for fragment shaders, much of the source is injected from the kernel class. Example for WebGL2: https://github.com/gpujs/gpu.js/blob/develop/src/backend/web-gl/kernel.js

I still don't see an issue with support, even if it is only webgl2. The thing we have to think about is what the API looks like from a javascript viewpoint. Like what connects where, and proposed features etc.

jacobbogers commented 5 years ago

great, Thanks for clarifying about the fragment shaders,

evdem commented 5 years ago

I've added WebGL2 based GEMM test (with R32F textures) at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm.htm . Where can I find gpu.js based test? ;) Sorry, I've to check today my test results for GT 710 (WG threads synchronization is corrected). I'm working on compute shader optimization for GEMM.

evdem commented 5 years ago
  1. WebGL2 is based on OpenGL ES 3.0, WebGL2-compute on ES 3.1
  2. HALF_FLOAT and int8 math are very important for ML (remember tensor cores). It saves bandwidth (at least, right now) and throughput (if FP16 math is supported by HW and drivers) .
  3. my test results are still not clear (hope optimized compute shaders will be x2 faster).
  4. all that is not very urgent now (for near future)
  5. does gpu.js support FFT or image convolution (filtering by a small NxN kernel matrix (FLOAT, HALF_FLOAT of may be int8 :) for ML)?
robertleeplummerjr commented 5 years ago

Where are the sources for your tests? We just have to plan and make it practical for JavaScript, and them implementation will follow. If I can see your sources it can help me and others to see how it can be enabled and utilized.

robertleeplummerjr commented 5 years ago

This is it? https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm3.htm

evdem commented 5 years ago

yes, it is the fastest third shader (you can see JS in the page source). Now I'm working on the 4 shader. The first one is the simplest. https://www.ibiblio.org/e-notes/webgl/gpu/mul/mul32b.htm - my WebGL2 test with R32F textures.

robertleeplummerjr commented 5 years ago

Very cool. It looks as though this is the main means of interaction then, that'd need to be integrated?:

for(int i=0; i<n; i++)
    acc += texelFetch(samp0, ivec2(gl_FragCoord.x, i), 0).r * 
           texelFetch(samp1, ivec2(i, gl_FragCoord.y), 0).r;

In other words, texelFetch?

robertleeplummerjr commented 5 years ago

Short of other configurations, for example, to gl.something, adding that into the member expressions function node (the thing that transpiles), even if it is custom for webgl2 (or webgl2-compute?) will be easy. Here is where it is currently implemented: https://github.com/gpujs/gpu.js/blob/adf5062427f18e41a6144dd1c8dcc0e099600cb9/src/backend/web-gl/function-node.js#L752

The javascript means of fallback would seem to be a simple array[z][y][x]

evdem commented 5 years ago

texelFetch(samp0, ivec2(gl_FragCoord.x, i), 0).r is analog of texture2D(samp, vTexCoord).r Look at OpenGL ES spec or Google. The mul32 script is just matrix multiplication.

jacobbogers commented 5 years ago

Hi evdem, sgemm is one of the most straightforward of blas kernels (level3) to port, I am doing the same thing for all my level 3 BLAS kernels porting them to webgl2 Level 3 routines, https://github.com/R-js/blasjs#level-3-routines For FFT, i am not sure if that is usefull unless for sound, for jpeg compression/decompression, maybe, but the Canvas object can handle that already

evdem commented 5 years ago

I'm from webgl-dev-list :) At first I'd like to get a good test-case for compute shaders. It is interesting if they will be faster than WebGL2 with textures. It is not clear as texture uses its one memory HW.

robertleeplummerjr commented 5 years ago

Look at OpenGL ES spec or Google

I'm just buried in getting headless-gl compliant with khronos' tests at the moment, but will take more time once I get passed that. If you were to put into words what the main difference is between them, in looking at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm3.htm, for example, is it pretty much everything else around this:

// Perform the computation for a single tile
        for (uint k=0u; k < TS; k++) {
            for (uint w=0u; w < WPT; w++) {
                acc[w] += Asub[k][row] * Bsub[col + w*RTS][k];
            }
        }

In either case: You did a really good job commenting on your code, ty!

robertleeplummerjr commented 5 years ago

@jacobbogers Your attention to detail in https://github.com/R-js/blasjs/blob/master/README.md is superb. You always surprise me how thorough you are!

evdem commented 5 years ago

I know only that FFT is used for water simulations :) @robertleeplummerjr: not me look at https://cnugteren.github.io/tutorial/pages/page1.html by Cedric Nugteren

robertleeplummerjr commented 5 years ago

This thread is becoming very valuable for references!

evdem commented 5 years ago

I'd like to finish with Convolution NN in ML. They like to filter images by small "self-teatching" NxN kernel many-many times. It is very suitable for GPU. We can ask brain.js for details :)

robertleeplummerjr commented 5 years ago

The unfinished convolutions are here in brain.js: https://github.com/BrainJS/brain.js/blob/develop/src/layer/convolution.js

robertleeplummerjr commented 5 years ago

Mostly finished anyway. Here are the unit tests passing: https://travis-ci.com/BrainJS/brain.js-cnn-integrity/jobs/156623402

evdem commented 5 years ago

is it plain JS or GPU shaders?

evdem commented 5 years ago

if it is JS it takes "years"

jacobbogers commented 5 years ago

@evdem ,

jacobbogers commented 5 years ago

@evdem I checked out your(?) code at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm2.htm

I dont see any reference in WebGL2 to functions like memoryBarrierShared() or barrier();

Maybe I missed something

evdem commented 5 years ago

it is WebGL2-compute look at OpenGL ES 3.1 spec or Google

jacobbogers commented 5 years ago

Found it , thanks) I only started diving into GPU most recently , thanks for the tips

evdem commented 5 years ago

@jacobbogers don't you like to make CLBlast library port into JS (WebGL2-compute)? compute shaders are similar to OpenCL. Cedric Nugteren wrote that he is too busy.

robertleeplummerjr commented 5 years ago

is it plain JS or GPU shaders?

In regards to the referenced brain.js convolution code, it uses GPU.js, so technically, it is both. And what we are planning with this very thread will affect it.

evdem commented 5 years ago

It is rather difficult to extract WebGL shaders from projects like gpu.js, WebDNN, TF.JS :)

evdem commented 5 years ago

TF.JS hided GPU docs from his site :)

jacobbogers commented 5 years ago

@evdem, I will answer your points:

robertleeplummerjr commented 5 years ago

It is rather difficult to extract WebGL shaders from projects like gpu.js, WebDNN, TF.JS :)

Sure about that?

gpu.toString()

;)

evdem commented 5 years ago

@jacobbogers I meant https://github.com/CNugteren/CLBlast OpenCL lib. It is faster than AMD clBLAST https://arxiv.org/abs/1705.05249. TensorFlow project has CLBlast backend.

evdem commented 5 years ago

@robertleeplummerjr gpu.toString()? but where is matrix multiplication demo?

jacobbogers commented 5 years ago

Yes, i looked at Nugteren link (forked it) I could see Nugteren forked it from https://github.com/clMathLibraries/clBLAS (AMD) the AMD one has all the blas subroutines implemented

Thanks

evdem commented 5 years ago

http://gpu.rocks/playground/ :)

jacobbogers commented 5 years ago

@evdem your demo at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm2b.htm shows a popup Can't get WebGL2 compute Maybe a 404 happened somewhere?

evdem commented 5 years ago

you need Chrome Canary with WebGL2Compute enabled. see https://github.com/9ballsyndrome/WebGL_Compute_shader

robertleeplummerjr commented 5 years ago

@robertleeplummerjr gpu.toString()? but where is matrix multiplication demo?

The example is on gpu.rocks webpage. The idea is that most any javascript could be used with gpu.js that works in a similar threaded approach like shaders.

Full example with entire kernel output: https://jsfiddle.net/robertleeplummerjr/2g6Lhy1k/

jacobbogers commented 5 years ago

@evdem as of now i dont use anything specific "compute" shader right now, just a webgl2 (+ one extention) as it needs to work on current devices.

When will compute shaders be normative? I read firefox is hesitent to implement it.

evdem commented 5 years ago

Google's ANGLE is used by Chrome, Firefox, Opera. Edge transits to Blink (Chrome) backend this year. OpenGL backend for compute is ready and is tested. Intel makes D3D11 backend. So this year it will be ready and you have ~1 year to prepare your software. And you can help test it :)

jacobbogers commented 5 years ago

AHA -> gl = cnv.getContext('webgl2-compute')

evdem commented 5 years ago

only Apple uses metal and WebGPU. Google works on Vulkan backent for ANGLE but I think it will be just compute successor.

robertleeplummerjr commented 5 years ago

And, fyi, the headless-gl project I'm working on to facilitate gpu.js uses Google Angle as well.

jacobbogers commented 5 years ago

@robertleeplummerjr you are ahead of the curve)) @evdem very cool stuff, now my problem? what command are "compute" and what arent "compute" /OpenGL/specs/es/3.0/es_spec_3.0 doesnt show that,

jacobbogers commented 5 years ago

OK, this was a very interesting chat If it is here within a year,I better switch to "compute shaders" ,...., these extra facilities are ALL discussed and explained in "es_spec_3.0" right? Still my last question, how do i know what is specific to "welbgl-compute" context????

evdem commented 5 years ago

gl = cnv.getContext('webgl2-compute') returns 'webgl2-compute' Context. There are webgl, webgl2 and now new webgl2-compute experimental context based on OpenGL ES 3.1.

evdem commented 5 years ago

OpenGL ES 3.1 ! ask at https://groups.google.com/d/forum/webgl-dev-list forum :)