tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.42k stars 1.92k forks source link

Slow initial runs followd by blazing fast computation #8415

Open h-OUS-e opened 6 days ago

h-OUS-e commented 6 days ago

Hello,

I am having this issue where I run an initial computation, basic multiplication of matrices using TFJS. On the first run, the computation is extremely slow, taking 800ms. On the second run, it takes 5ms to run. I mocked up a simpler version for debugging and I get ~90ms on the first run and 1ms on subsequent runs. The JS code is here:

document.getElementById("testOptim").addEventListener('click', function(){ console.time('testOptim: '); let a = tf.randomNormal([15, 3]); let b = tf.randomNormal([15, 15]); let c = tf.matMul(b, a); console.timeEnd('testOptim: '); });

And I am using latest tfjs module:

Based on previous reported issues, I suspect the problem is with how initial caching of kernels is slow, but I couldn't find a good solution to fix it. Any directions or thoughts would be appreciated!

shmishra99 commented 1 day ago

Hi @h-OUS-e,

I was able to replicate the code you shared. The initial slowness is due to the loading and compiling of the kernel. Once the kernel is loaded and compiled, it will be cached for use.

Code:

console.time('testOptim1: ');
let a = tf.randomNormal([15, 3]);
let b = tf.randomNormal([15, 15]);
let c = tf.matMul(b, a);
console.timeEnd('testOptim1: ');

console.time('testOptim: 2 ');
a = tf.randomNormal([16, 4]);
b = tf.randomNormal([16, 16]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 2 ');

console.time('testOptim: 3 ');
a = tf.randomNormal([17, 4]);
b = tf.randomNormal([17, 17]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 3 ');

console.time('testOptim: 4 ');
a = tf.randomNormal([18, 5]);
b = tf.randomNormal([18, 18]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 4 ');

Results:

image

From the results, you can see that the first run (testOptim1:) takes longer compared to the subsequent runs (testOptim: 2, testOptim: 3, and testOptim: 4). This is due to the kernel loading and compiling, while the later multiplications take less time because the results are cached. To improve performance, you could consider registering the kernel during tfjs initialization by loading dummy tensors.

Let me know if this helps!

Thank you!

h-OUS-e commented 1 day ago

Thank you @shmishra99. Is the loading and compiling of the kernel always slow? Preloading dummy tensors on initialization seemed to solve the issue. But for my main application, the range of possible tensors to be multiplied with each other is too large, and loading them all would make the initial loading of the page very slow. I basically need to load every possible dummy tensor, where the largest would be of a shape of [200,200] or more. Is there a better solution?

For now, I set the backend to the cpu using tf.setBackend('cpu');, and to my surprise it has been very fast, which appears to work for the needs of my application. But beyond that, I remain curious if there is a way to still utilize the GPU without the slow loading and compiling time of the kernels?

Thanks!

shmishra99 commented 14 hours ago

@h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations. Thank You!!

h-OUS-e commented 13 hours ago

I see, and you’re right that the CPU is slower for larger operations. Would you you know of a fast way to preload the kernels with dummy tensors assuming I want to load all possible tensors of shape (n,n)? Thank you!On Oct 18, 2024, at 7:42 AM, Shivam Mishra @.***> wrote: @h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations. Thank You!!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

shmishra99 commented 13 hours ago

@h-OUS-e , To preload the tensors, you can use the following code:

tf.ready().then(() => {
  const LoadTensor = tf.randomNormal([n, n]);
  LoadTensor.data();
});

However, I will say that this approach is not highly recommended, as it may slow down the initial page loading time.

Please let me know if this helps. Thank you!