[Course-2020-2023] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.
The Mandelbrot calculation isn't done with CUDA in the last example of Mandelbrot. For sure, it's done with @jit, because create_fractal function is used. To launch CUDA you need to launch mandel_kernel function and of course, as you said copy data to CUDA before.
And the interesting question is why the second launch of the create_fractal function was faster. Because Numba has to compile function and this takes time. On the second time, the cached machine code will be launched much faster.
The issue is found in https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/GPU_Programming.ipynb.
The Mandelbrot calculation isn't done with CUDA in the last example of Mandelbrot. For sure, it's done with @jit, because create_fractal function is used. To launch CUDA you need to launch mandel_kernel function and of course, as you said copy data to CUDA before.
And the interesting question is why the second launch of the create_fractal function was faster. Because Numba has to compile function and this takes time. On the second time, the cached machine code will be launched much faster.