Open evdem opened 5 years ago
Can you show a scenario where it'd be helpful? I'm sure we could support it.
as for HALF_FLOAT, then f16 gives x2 acceleration (for free and very useful for AI, ML they say) on modern GPU Intel (SkyLake is tested in OpenCL), mobile, Nvidia (Turing in Cuda). Both throughput (mediump precision) and bandwidth (HALF_FLOAT).
as for compute shaders see https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm.htm. Cedric Nugteren (in OpenCL very similar to compute shaders) get x5 acceleration for script with shared memory wit respect to naive one. I get only 50% in experimental WebGL2-compute yet. I can compare WebGL and WebDL2-compute (in a while).
they are not called compute shaders its a render extention (so not part of standard) to the webgl2, but its only lives in the fragment shader, I see from the gpu.js most of the heavy duty stuff is done in the vertex shader, i dont know all of the code so maybe the latter is not true anymore. for ML training I would still use 32 bit floating point, once your model has been trained, you can execute the model with 16 bit floats, thats how it is done.
I think vertex shaders are pretty light actually:
However, the bulk of code is in fragment shaders:
NOTE: for fragment shaders, much of the source is injected from the kernel class. Example for WebGL2: https://github.com/gpujs/gpu.js/blob/develop/src/backend/web-gl/kernel.js
I still don't see an issue with support, even if it is only webgl2. The thing we have to think about is what the API looks like from a javascript viewpoint. Like what connects where, and proposed features etc.
great, Thanks for clarifying about the fragment shaders,
I've added WebGL2 based GEMM test (with R32F textures) at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm.htm . Where can I find gpu.js based test? ;) Sorry, I've to check today my test results for GT 710 (WG threads synchronization is corrected). I'm working on compute shader optimization for GEMM.
Where are the sources for your tests? We just have to plan and make it practical for JavaScript, and them implementation will follow. If I can see your sources it can help me and others to see how it can be enabled and utilized.
yes, it is the fastest third shader (you can see JS in the page source). Now I'm working on the 4 shader. The first one is the simplest. https://www.ibiblio.org/e-notes/webgl/gpu/mul/mul32b.htm - my WebGL2 test with R32F textures.
Very cool. It looks as though this is the main means of interaction then, that'd need to be integrated?:
for(int i=0; i<n; i++)
acc += texelFetch(samp0, ivec2(gl_FragCoord.x, i), 0).r *
texelFetch(samp1, ivec2(i, gl_FragCoord.y), 0).r;
In other words, texelFetch
?
Short of other configurations, for example, to gl.something
, adding that into the member expressions function node (the thing that transpiles), even if it is custom for webgl2 (or webgl2-compute?) will be easy. Here is where it is currently implemented: https://github.com/gpujs/gpu.js/blob/adf5062427f18e41a6144dd1c8dcc0e099600cb9/src/backend/web-gl/function-node.js#L752
The javascript means of fallback would seem to be a simple array[z][y][x]
texelFetch(samp0, ivec2(gl_FragCoord.x, i), 0).r is analog of texture2D(samp, vTexCoord).r Look at OpenGL ES spec or Google. The mul32 script is just matrix multiplication.
Hi evdem,
sgemm
is one of the most straightforward of blas kernels (level3) to port,
I am doing the same thing for all my level 3 BLAS kernels porting them to webgl2
Level 3 routines,
https://github.com/R-js/blasjs#level-3-routines
For FFT, i am not sure if that is usefull unless for sound, for jpeg compression/decompression, maybe, but the Canvas object can handle that already
I'm from webgl-dev-list :) At first I'd like to get a good test-case for compute shaders. It is interesting if they will be faster than WebGL2 with textures. It is not clear as texture uses its one memory HW.
Look at OpenGL ES spec or Google
I'm just buried in getting headless-gl compliant with khronos' tests at the moment, but will take more time once I get passed that. If you were to put into words what the main difference is between them, in looking at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm3.htm, for example, is it pretty much everything else around this:
// Perform the computation for a single tile
for (uint k=0u; k < TS; k++) {
for (uint w=0u; w < WPT; w++) {
acc[w] += Asub[k][row] * Bsub[col + w*RTS][k];
}
}
In either case: You did a really good job commenting on your code, ty!
@jacobbogers Your attention to detail in https://github.com/R-js/blasjs/blob/master/README.md is superb. You always surprise me how thorough you are!
I know only that FFT is used for water simulations :) @robertleeplummerjr: not me look at https://cnugteren.github.io/tutorial/pages/page1.html by Cedric Nugteren
This thread is becoming very valuable for references!
I'd like to finish with Convolution NN in ML. They like to filter images by small "self-teatching" NxN kernel many-many times. It is very suitable for GPU. We can ask brain.js for details :)
The unfinished convolutions are here in brain.js: https://github.com/BrainJS/brain.js/blob/develop/src/layer/convolution.js
Mostly finished anyway. Here are the unit tests passing: https://travis-ci.com/BrainJS/brain.js-cnn-integrity/jobs/156623402
is it plain JS or GPU shaders?
if it is JS it takes "years"
@evdem ,
the blas.JS library is part of R-js (R statistics) so think of linear regression etc. not FEMM, R is not used for NN training or fluid dynamics simulations. Although I am sure someone somewhere made an extention for that.
I have done most of the BLAS level 3 subroutines conversion to GPU, (private repo as it is under construction)
@evdem I checked out your(?) code at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm2.htm
I dont see any reference in WebGL2 to functions like memoryBarrierShared()
or barrier()
;
Maybe I missed something
it is WebGL2-compute look at OpenGL ES 3.1 spec or Google
Found it , thanks) I only started diving into GPU most recently , thanks for the tips
@jacobbogers don't you like to make CLBlast library port into JS (WebGL2-compute)? compute shaders are similar to OpenCL. Cedric Nugteren wrote that he is too busy.
is it plain JS or GPU shaders?
In regards to the referenced brain.js convolution code, it uses GPU.js, so technically, it is both. And what we are planning with this very thread will affect it.
It is rather difficult to extract WebGL shaders from projects like gpu.js, WebDNN, TF.JS :)
TF.JS hided GPU docs from his site :)
@evdem, I will answer your points:
I am porting all kernels from BLAS to GPU (i started with level 3 because its most important).
I read your shared memory experiments getting 50% boost, I think it depends heavily on browser/driver/os/hardware? What is your experience using single color float channels vs multicolor, would the GPU just "waste space" with the other channels internally?
I checked clBLAS, as far as I can see, there is no specific algo for bandmatrices for symmetic matrices i see a "hermetian" file, (maybe thats the one for symmetric matrices at least). I am surely going to look at
I agree there is a mapping challenge from JS/C/whatever to composing shaders. Thats like every compiler/transpiler problem ever))
It is rather difficult to extract WebGL shaders from projects like gpu.js, WebDNN, TF.JS :)
Sure about that?
gpu.toString()
;)
@jacobbogers I meant https://github.com/CNugteren/CLBlast OpenCL lib. It is faster than AMD clBLAST https://arxiv.org/abs/1705.05249. TensorFlow project has CLBlast backend.
@robertleeplummerjr gpu.toString()? but where is matrix multiplication demo?
Yes, i looked at Nugteren link (forked it) I could see Nugteren forked it from https://github.com/clMathLibraries/clBLAS (AMD) the AMD one has all the blas subroutines implemented
Thanks
@evdem your demo at https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm2b.htm shows a popup Can't get WebGL2 compute
Maybe a 404 happened somewhere?
you need Chrome Canary with WebGL2Compute enabled. see https://github.com/9ballsyndrome/WebGL_Compute_shader
@robertleeplummerjr gpu.toString()? but where is matrix multiplication demo?
The example is on gpu.rocks webpage. The idea is that most any javascript could be used with gpu.js that works in a similar threaded approach like shaders.
Full example with entire kernel output: https://jsfiddle.net/robertleeplummerjr/2g6Lhy1k/
@evdem as of now i dont use anything specific "compute" shader right now, just a webgl2 (+ one extention) as it needs to work on current devices.
When will compute shaders be normative? I read firefox is hesitent to implement it.
Google's ANGLE is used by Chrome, Firefox, Opera. Edge transits to Blink (Chrome) backend this year. OpenGL backend for compute is ready and is tested. Intel makes D3D11 backend. So this year it will be ready and you have ~1 year to prepare your software. And you can help test it :)
AHA -> gl = cnv.getContext('webgl2-compute')
only Apple uses metal and WebGPU. Google works on Vulkan backent for ANGLE but I think it will be just compute successor.
And, fyi, the headless-gl project I'm working on to facilitate gpu.js uses Google Angle as well.
@robertleeplummerjr you are ahead of the curve)) @evdem very cool stuff, now my problem? what command are "compute" and what arent "compute" /OpenGL/specs/es/3.0/es_spec_3.0 doesnt show that,
OK, this was a very interesting chat If it is here within a year,I better switch to "compute shaders" ,...., these extra facilities are ALL discussed and explained in "es_spec_3.0" right? Still my last question, how do i know what is specific to "welbgl-compute" context????
gl = cnv.getContext('webgl2-compute') returns 'webgl2-compute' Context. There are webgl, webgl2 and now new webgl2-compute experimental context based on OpenGL ES 3.1.
OpenGL ES 3.1 ! ask at https://groups.google.com/d/forum/webgl-dev-list forum :)
Experimental WebGL2 Compute shaders are supported in Chrome Canary. See e.g. my pages https://www.ibiblio.org/e-notes/webgl/gpu/contents.htm. Is it possible to add WebGL2-compute into gpu.js? Couldn't you help (guide) me how to add them? (I get refuse from WebDNN :)
By the way are HALF_FLOAT data supported in gpu.js? Simple test at https://www.ibiblio.org/e-notes/webgl/gpu/tex2_16.htm
Evgeny Demidov