Open seibert opened 7 years ago
Any updates on this issue? Thanks
We haven't done anything on this. Noticing that this was a CUDA runtime API call (which we can't use), rather than a CUDA driver API call (which we do use), I went looking for the equivalent call in the driver API:
Strangely, in the runtime API the bank configuration is set on the device, whereas in the driver API it can be set on a per-kernel basis. (How that works for concurrent kernels is a mystery to me...)
To add this feature, the C function would need to be registered with: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/drvapi.py
And then an API would need to be added (probably in CUDAKernelBase: https://github.com/numba/numba/blob/master/numba/cuda/compiler.py#L311) to allow it to be set on a given kernel.
I understand the challenges involved in doing this at runtime, but is there any way to simply configure the shared memory when compiling the kernel? Even if all kernels needed to use the same configuration that would still allow us to test different shared memory configurations. Thank you!
CUDA has allowed the shared memory bank configuration to be altered since Kepler:
http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1ga4f3f8a422968f9524012f43ba852058
This is important to avoid bank conflicts when working with float64 arrays in shared memory. We should expose set and get shared memory configuration methods on the CUDA device.