numba / numba

NumPy aware dynamic Python compiler using LLVM
https://numba.pydata.org/
BSD 2-Clause "Simplified" License
9.92k stars 1.13k forks source link

Expose CUDA controls for shared memory configuration #2574

Open seibert opened 7 years ago

seibert commented 7 years ago

CUDA has allowed the shared memory bank configuration to be altered since Kepler:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1ga4f3f8a422968f9524012f43ba852058

This is important to avoid bank conflicts when working with float64 arrays in shared memory. We should expose set and get shared memory configuration methods on the CUDA device.

dgerzhoy commented 5 years ago

Any updates on this issue? Thanks

seibert commented 5 years ago

We haven't done anything on this. Noticing that this was a CUDA runtime API call (which we can't use), rather than a CUDA driver API call (which we do use), I went looking for the equivalent call in the driver API:

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g430b913f24970e63869635395df6d9f5

Strangely, in the runtime API the bank configuration is set on the device, whereas in the driver API it can be set on a per-kernel basis. (How that works for concurrent kernels is a mystery to me...)

To add this feature, the C function would need to be registered with: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/drvapi.py

And then an API would need to be added (probably in CUDAKernelBase: https://github.com/numba/numba/blob/master/numba/cuda/compiler.py#L311) to allow it to be set on a given kernel.

collinmccarthy commented 4 years ago

I understand the challenges involved in doing this at runtime, but is there any way to simply configure the shared memory when compiling the kernel? Even if all kernels needed to use the same configuration that would still allow us to test different shared memory configurations. Thank you!