Closed jmanden closed 6 years ago
This is amazing.
How many kernels are you using? Is this currently implemented?
Hi Robert, Thank you for the kind words. The link with the web app (http://castlemountain.dk/test/sprayMPR22GPU.html) does not implement gpu.js but has 3 different webworkers; one for each image stack. The above GPU.js script is just an excerpt from what I have been playing with. I'm not sure I understand your question regarding the number of kernels? Do you have a suggestion for how I can exploit the multi-thread capability of GPU.js for my app?
I’m in dire need of input regarding a project that needs fast computation of ray tracings
We love a good raytrace!
The app has OK speed for a stack of 512x512px images
Running a profiler over your work, I get the following:
Which is saying one of your largest bottlenecks is actually getting data out of the canvas, and later back into it. The calculations look like this on the profiler:
The above test script for generation of a 3D array is not running faster than an implementation in javascript with triple nested for loops running on an i7 processor with geforce gtx 960m graphics card. How is that?
This is because there is hardly any math in the kernel. Because this is extremely simple math, the cpu can optimize this, and there will be very little if any gain via a gpu. The gpu is hungry for larger, more complex problems.
Too there is a lot going on in gpu.js, we parse the function sent in, create an abstract syntax tree, traverse the tree to generate glsl, create a glsl program via a fragment shader and a vertex shader, bind the program to a helper function, input values to textures. 99% of this overhead goes away after the first run of the kernel.
To get a clearer picture of the speed at which the kernel runs, run myFunc.build()
before timing it, then time it. Likely the number will drop considerably.
Is it possible to transfer the 3D voxel image data to the GPU when the app is started for subsequent referencing and thus avoiding the transfer of these data each time
Yes, all you need to do is render your image to a texture like so:
const kernel = gpu.createKernel(function(imageData) {
return imageData[this.thread.y][this.thread.x];
}, {
output: [512, 512],
outputToTexture: true
});
const texture = kernel(imageDate);
Later, you can feed the image in using otherKernel(texture)
, as it is just plain javascript at this point.... which is bound to a webgl texture.
Is it possible to take a different approach and increase the speed of running through the pixel values with GPU.js
Based on that profiler report, I say you have the right idea. I believe after you implement the gpu.js method and are not using the toDataUrl and WebWorkers, I bet it will be upwards of 20x faster.
Hi Robert,
First of all - thanks for the extensive reply! And sorry for the delay in respond time. I have been on a boat for a day! However, it seems that the reply was not extensive enough for me to get a grip on how to implement with gpu.js :( I see that there is some overhead in the toDataURL and I have changed the code as suggested. But what really slow things down is what I have previously described; increasing slice thickness which affects the runtime of the function getImagesDataSwivelAxial which increases the length of the outer loop for(s=sliceNoBegin;s<endSlice;s++) I have updated the “proof of concept script” referenced above as suggested by calling build():
var input = [512,512,30]; // to emulate a stack of images; thirty 512x512 px images.
const myFunc = gpu.createKernel(function() {
return this.thread.x+this.thread.y+this.thread.y;
}).setOutput(input);
myFunc.build();
startTime = new Date().getTime();
const c = myFunc();
endTime = parseInt(new Date().getTime())-parseInt(startTime);
console.log(endTime);
However, the run time of this script (simulating the traversal of 30 images of 512x512 pixels) is approx. 1596 ms which is way much more than the runtime of the currently pure javascript implemented function getImagesDataSwivelAxial when this is called for the generation of a slice based on 30 512x512px images; i.e. approx runtime 200 ms. Something must be wrong in the above proof of concept script!!!??
I see the point with the texture generation which allows data to stay on the gpu for future referencing in a kernel.
As far as I understand if the function getImagesDataSwivelAxial is to be altered to incorporate gpu.js acceleration one would need to generate a texture based on the data in theImagesData (which contain an array of the canvas imagedata of the original axial slices) which could be altered to a regular array (i.e. no canvas imagedata) before generating the gpu.js texture. Afterwards a kernel could be called running the three loops:
for(s=sliceNoBegin;s<endSlice;s++) {
…
for(i=0;i<sHeight;i++) {
…
for(l=0;l<sWidth;l++) {
…..
to traverse the slices in the texture and extract the voxel data from the provided texture to generate one 512x512 px slice (based on the current projection angulation/rotation and slice thickness) to be drawn on the canvas.
Robert, I know it is a lot to ask but perhaps you could write some proof of concept code in order for me to better understand? For an example a kernel that generates a concrete data texture (e.g. an array 30(slices)x512x512 pixels) and write the kernel function needed to speedily run though all voxels as one would need to do in the function getImagesDataSwivelAxial?
Yours sincerely
I think the part that would make this clearer is that with a gpu kernel, you have to think really really small. In the case of this:
for(s=sliceNoBegin;s<endSlice;s++) {
…
for(i=0;i<sHeight;i++) {
…
for(l=0;l<sWidth;l++) {
…..
This (looping through each value, and running gpu.js on them, if I understand this correctly) would not be how you'd generally use gpu.js. The kernel operates in pixels that run in tandem, so all pixels (or target numbers) would already have their x,y,z set. This is because they operate in tandem, likely hundreds of pixels running in parallel.
This video well illustrates the difference: https://www.youtube.com/watch?v=-P28LKWTzrI
Skip to 1:22, and see how the paintballs are all in motion, but each is striking the canvas at different times? Each paintball is in its own thread, and we just need to do a little raytracing to get the values needed to build up the targets.
Can you email me at my github profile email address and we can work out the details of your project?
https://github.com/gpujs/gpu.js/releases/tag/1.4.0 adds the missing features needed here. More on this soon.
This idea eventually progressed into https://github.com/Mulrecon/Mulrecon/blob/master/index.html. We can open up new issues here for performance. but as a first proof of concept or mvp, we have this running fully on gpu.js:
Pefect I'll play around with the POC and give you some feedback regarding the axes and in general! Den 6. juni 2018 kl. 20.11.59 +02.00, skrev Robert Plummer notifications@github.com:
This idea eventually progressed into https://github.com/Mulrecon/Mulrecon/blob/master/index.html. We can open up new issues here for performance. but as a first proof of concept or mvp, we have this running fully on gpu.js: screen shot 2018-06-06 at 2 03 11 pm https://user-images.githubusercontent.com/679099/41056865-7c175eae-6993-11e8-8fb7-0c6cad9e87f0.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gpujs/gpu.js/issues/273#issuecomment-395163406, or mute the thread https://github.com/notifications/unsubscribe-auth/APvJFYpDodEZlLGc6rRtmOpLbuu6dH22ks5t6BtvgaJpZM4Saq4L.
Dear GPU.JS community
I’m in dire need of input regarding a project that needs fast computation of ray tracings and I came across GPU.js and was wondering if it can be used for speed optimization in relation to my problem? Here comes some background info.
See the following prototype: http://castlemountain.dk/test/sprayMPR22GPU.html
The web app takes a stack of jpeg images and extracts the canvas imagedata to generate a 3D matrix of voxel values. A 3D cube with node (x,y,z) values that can be rotated allows for generation of double oblique planes and thus generating 2D images on the fly by extracting the respective values from the voxels in the aforementioned 3D matrix.
However, the app is currently running in javascript and (lack of) speed is an issue.
Currently an arbitrary image slice is basically generated by running a function (see function getImagesDataSwivelAxial in workersMPRGPU.js) that generates the pixels in a 2D image by inputting the starting voxel (x,y,z) value and height/width of the image in question as well as two vectors; one in the stack scroll direction, the other in the height direction of the generated slice. Consequently each pixel on the generated image is mapped to the corresponding voxel in the 3D matrix.
The app has OK speed for a stack of 512x512px images when only one slice is generated. However when multiple slices (slice thickness can be adjusted with the arrows in the top toolbar) are averaged or traversed for generation of maximum- or minimum intensity projections running speed becomes unbearable.
I have tried experimenting with gpu.js to see if it is possible to achieve a performance boost relative to javascript but I can’t seem to get a good grip on how to approach the problem with the GPU.js api:
The above test script for generation of a 3D array is not running faster than an implementation in javascript with triple nested for loops running on an i7 processor with geforce gtx 960m graphics card. How is that?
Furthermore, if the array with the 3D voxel data is supplied in the kernel function call running speed significantly decreases.
Is it possible to transfer the 3D voxel image data to the GPU when the app is started for subsequent referencing and thus avoiding the transfer of these data each time the kernel function is run (i.e. for the app when a new slice is generated)?
Is it possible to take a different approach and increase the speed of running through the pixel values with GPU.js?
I hope someone can clarify some of the issues raised above for me.
Sincerely