inducer / pycuda

CUDA integration for Python, plus shiny features
http://mathema.tician.de/software/pycuda
Other
1.85k stars 287 forks source link

gpuarray.dot() works too slow at the first calling #309

Open decoli opened 3 years ago

decoli commented 3 years ago

I found it will cost much time when the first calling of gpuarray.dot(). Here is my code:

...
# the first time calling
start.record()
# res_gpu = gpuarray.dot(coef_gpu, image_gpu)
gpuarray.dot(coef_gpu, image_gpu)
end.record()
end.synchronize()
secs = start.time_till(end)
print("\ntime cost: {:.3f}ms\n".format(secs)) # time cost: 813.931ms

# the second time calling
start.record()
# res_gpu = gpuarray.dot(coef_gpu, image_gpu)
gpuarray.dot(coef_gpu, image_gpu)
end.record()
end.synchronize()
secs = start.time_till(end)
print("\ntime cost: {:.3f}ms\n".format(secs)) # time cost: 0.056ms
...

Why it will happen? And how can I solve the problem?

inducer commented 3 years ago

That's because the first time the function is called, a few kernels are compiled behind the scenes to do the work. The basic assumption is that your program will run for long enough (otherwise, why are you using a GPU to speed it up?) that this cost will be more than amortized. Also, that cost should only be incurred once. The kernels should be in the disk cache after that, making them quick to load.

decoli commented 3 years ago

Thanks for your reply. I guess...before actual use of gpuarray.dot(), I can call it for the kernels being compiled, like code:

...
gpuarray.dot(like_coef_gpu, like_image_gpu) # just for the kernels being compiled
...
...
gpuarray.dot(coef_gpu, image_gpu) # really calling

Is this a good solution for it?

inducer commented 3 years ago

If that works for your use case, then yes, that should avoid compilation/module load delays on subsequent runs of the kernel.

decoli commented 3 years ago

Oh... I found gpuarray.dot() is different from numpy.dot().

It seems that

import skcuda.linalg as linalg
linalg.dot()

can be regarded as a package that can run on the GPU and can be used with pycuda.

However, it will get error: CUSOLVER library only available in CUDA 7.0 and later

New problem...