Supporting CUDA 12 is essential for our project to work in C++ 20. Most importantly, the use of NVVM link-time optimisation for CUDA driver API in the current project will no longer work in CUDA 12:
From 12.0, JIT LTO support is now part of CUDA Toolkit. JIT LTO support in the CUDA Driver through the cuLink driver APIs is officially deprecated. Driver JIT LTO will be available only for 11.x applications. The following enums supported by the cuLink Driver APIs for JIT LTO are deprecated:
Device runtime compiler
Additional, I don't personally find the stateless design of STPDeviceRuntimeBinary and STPDeviceRuntimeProgram sensible any more, as nvrtcProgram and CUmodule themselves are stateful, so there is no need to initialise all compiler options and put them in a custom data structure in one go and pass it onto one function call.
Likewise, it is no longer necessary to write a test for those functionalities, because they are just wrappers.
In short, make them simple and provide nothing more than a handful of handy helpers like automatic compiler log printing after compilation. Then produce a similar solution to the new nvJitLink.
CUDA JIT LTO
Supporting CUDA 12 is essential for our project to work in C++ 20. Most importantly, the use of NVVM link-time optimisation for CUDA driver API in the current project will no longer work in CUDA 12:
Device runtime compiler
Additional, I don't personally find the stateless design of STPDeviceRuntimeBinary and STPDeviceRuntimeProgram sensible any more, as
nvrtcProgram
andCUmodule
themselves are stateful, so there is no need to initialise all compiler options and put them in a custom data structure in one go and pass it onto one function call.Likewise, it is no longer necessary to write a test for those functionalities, because they are just wrappers.
In short, make them simple and provide nothing more than a handful of handy helpers like automatic compiler log printing after compilation. Then produce a similar solution to the new
nvJitLink
.nvJitLink
.