gpujs / gpu.js

GPU Accelerated JavaScript
https://gpu.rocks
MIT License
15.14k stars 658 forks source link

Performance #601

Closed jadsongmatos closed 4 years ago

jadsongmatos commented 4 years ago

I'm doing a very basic test of a repetition between cpu and gpu, but when I run the code on the gpu it is 30% slower

GPU const { GPU } = require('../src'); const gpu = new GPU({ mode: 'gpu' }); const {performance} = require('perf_hooks'); var t0 = performance.now() function kernelFunction(a, b) { var sum = 0; for (var i = 0; sum < 33554432; i++) { sum += a[this.thread.y] + b[this.thread.x]; } return sum; } const kernel = gpu.createKernel(kernelFunction, { output: [1] }); const result = kernel([1], [1]); var t1 = performance.now() console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.") console.log(result[0]);

CPU const {performance} = require('perf_hooks'); var t0 = performance.now() function cpu(a, b) { var sum = 0; for (var i = 0; sum < 33554432; i++) { sum += a + b; } return sum; } const result = cpu(1, 1); var t1 = performance.now() console.log("Call to doSomething took " + (t1 - t0) + " milliseconds.") console.log(result)

robertleeplummerjr commented 4 years ago

You are uploading and downloading arrays of virtually no size, of course it is faster. Too the work on the kernel is synchronous, so you are essentially using 1 thread on the GPU. You need to find a way to make that operation asynchronous, so to take advantage of the GPU. There is overhead to use the GPU, especially when you aren't even allowing for throughput.

jadsongmatos commented 4 years ago

You are uploading and downloading arrays of virtually no size, of course it is faster. Too the work on the kernel is synchronous, so you are essentially using 1 thread on the GPU. You need to find a way to make that operation asynchronous, so to take advantage of the GPU. There is overhead to use the GPU, especially when you aren't even allowing for throughput.

How do I use more thread ?

harshkhandeparkar commented 4 years ago

The output size {output: [x, y, z]} determines the number of threads used. Each combination of x, y, and z represents one thread.

The output is an array which is either 1d, 2d or 3d. Each element of this output array is computed by one thread on the GPU and all these threads compute the output in parallel.