gpujs / gpu.js

GPU Accelerated JavaScript
https://gpu.rocks
MIT License
15.1k stars 652 forks source link

How to compute a sum matrix? #654

Closed githubaccount256 closed 3 years ago

githubaccount256 commented 3 years ago

Let's say I have a 4x5 data matrix with these values:

3 1 5 2
8 9 2 1
1 2 7 4
9 6 3 5
8 2 6 3

Is it possible to use GPU JS to compute the cumulative sum matrix? Basically, to perform dynamic programming? The resulting output would be the following matrix (zeroes inserted for padding-sake):

03 01 05 02
11 10 07 03
12 12 14 07
21 18 17 12
29 20 23 15

Basically, the first row of the resulting matrix would just be the first row of the data matrix. However, the second row would be the corresponding value in the data matrix plus the sum value thus far. So the first value in the second row would be 11 because 8 + 3 = 11. Similarly, the first value in the third row would be 12 because 1 + 11 = 12. The first value in the fourth row would be 21 because 9 + 12 = 21. Lastly, the first value in the last row would be 29 because 8 + 21 = 29.

I am currently doing this iteratively like this in my program:

createOutputMatrix(pixels, width, height) {
  let output = new Uint32Array(width * height);

  for (let col = 0; col < width; col++) {
    output[col] = pixels[col];
  }

  for (let row = 1; row < height; row++) {
    for (let col = 0; col < width; col++) {
      let currRowIdx = row * width + col;
      let prevRowIdx = (row - 1) * width + col;

      output[currRowIdx] = pixels[currRowIdx] + output[prevRowIdx];
    }
  }

  return output;
}

But this is very slow, so I was wondering if I could instead use GPU JS to improve the speed. I was attempting to do exactly this, but I am not sure how to incorporate state in GPU JS. Because each row depends on the prior row's computation (the cumulative sum), each row has to wait on the prior row to complete first.

I suppose it could be done by only parallelizing the columns, and then calling a new kernel function for each row, but that shuffling of data back and forth between the CPU and GPU for each row sounds slow. Can this be done efficiently entirely on the GPU?

githubaccount256 commented 3 years ago

As an example, this following snippet seems to work, except that it requires recomputing all prior sum values via a for-loop for every index in the output array. Is there anyway to avoid this or make it more efficient? My current implementation is very slow:

let pixels = new Uint32Array([
  3, 1, 5, 2,
  8, 9, 2, 1,
  1, 2, 7, 4,
  9, 6, 3, 5,
  8, 2, 6, 3
]);

function kFunction() {
  let idx = this.thread.x;

  let width = this.constants.width;

  let row = Math.floor(idx / width);
  let col = idx % width;

  let sum = 0;

  let base = (row * width) + col;

  for (let r = 1; r <= row; r++) {
    let prevRow = (r - 1) * width + col;

    let prevRowValue = this.constants.pixels[prevRow];
    sum += prevRowValue;
  }

  return this.constants.pixels[base] + sum;
}

var gpuKernel = gpu.createKernel(kFunction)
                  .setConstants({ pixels: pixels, width: 4, height: 5 })
                  .setOutput([4 * 5]);

console.log(gpuKernel());

I believe the ideal may be to somehow feed each row's output into the next row's input, but I'm unsure of how to do that or if it's even possible. Perhaps combineKernels or pipelining could be used to achieve this? Sorry, I'm still very new to the library.

robertleeplummerjr commented 3 years ago

It sounds like the approach you have would probably be most efficient.

If you used pipeline, to keep value on the GPU as a texture, you won't suffer from transfer of memory back and forth to and from the GPU. However, you will suffer from the environment having to be setup and ran, which is fairly expensive when compared to a simple for loop.

midnight-dev commented 3 years ago

@robertleeplummerjr I don't have a specific scenario in my head right now, but this kind of mass-arithmetic is always intriguing to me. Given your extensive experience with JavaScript and GPU computing, do you think traditional for loops in JavaScript would be faster when iterating through a long but shallow array - especially when it's just adding previous values together? Even without worker threads and WebAssembly, I mean.

I know GPU.js excels in wide arrays of data and anything that maps well to a 2D raster, but I've never looked at how steep of a penalty there is to creating a kernel & reading pixels after a calculation is finished. The above example doesn't strike me as a workload that can be sped up enough to overcome the apparent overhead.

robertleeplummerjr commented 3 years ago

do you think traditional for loops in JavaScript would be faster when iterating through a long but shallow array - especially when it's just adding previous values together?

It is going to heavily depend on what is around this math. If this were the only operation, the matrices were rather small, then it'd probably say there isn't much of a need to use GPU.js or the GPU at all for that matter. However, if there are many operations that require the GPU, and this one step, were on just the CPU, then I'd say use the GPU. In Brain.js, using the newer GPU architecture, for example, we need to use the GPU to calculate just one number (errors, sum of all error within a single matrix). The reason we use the GPU to do that is because the memory resides on the GPU, and the expense it would take to get it to the CPU and to then read it back would be of little benefit, and would block the whole operation while the GPU synchronizes.

The above example doesn't strike me as a workload that can be sped up enough to overcome the apparent overhead.

You are probably correct, but again, context needs to be considered.

midnight-dev commented 3 years ago

That makes sense. Thanks for the detailed response. By the way, when you mentioned new GPU architecture, was that a nod to tensor ops and such? I haven't bought a GPU in a hot minute, so the whole tensor & bounding box acceleration hardware is pretty foreign to me. However, if the web browser can make use of such features, I may have to hop on eBay.