Open decoli opened 3 years ago
That's because the first time the function is called, a few kernels are compiled behind the scenes to do the work. The basic assumption is that your program will run for long enough (otherwise, why are you using a GPU to speed it up?) that this cost will be more than amortized. Also, that cost should only be incurred once. The kernels should be in the disk cache after that, making them quick to load.
Thanks for your reply. I guess...before actual use of gpuarray.dot(), I can call it for the kernels being compiled, like code:
...
gpuarray.dot(like_coef_gpu, like_image_gpu) # just for the kernels being compiled
...
...
gpuarray.dot(coef_gpu, image_gpu) # really calling
Is this a good solution for it?
If that works for your use case, then yes, that should avoid compilation/module load delays on subsequent runs of the kernel.
Oh... I found gpuarray.dot() is different from numpy.dot().
It seems that
import skcuda.linalg as linalg
linalg.dot()
can be regarded as a package that can run on the GPU and can be used with pycuda.
However, it will get error:
CUSOLVER library only available in CUDA 7.0 and later
New problem...
I found it will cost much time when the first calling of gpuarray.dot(). Here is my code:
Why it will happen? And how can I solve the problem?