This commit adds a --nvidia option, which injects a library into the program under measurement, which records entry and exit into CUDA kernels via CUPTI
We might think about bumping the CMake requirement to 3.24 with this version, as older FindCUDAToolkit.cmake fail to correctly detect CUPTI headers[1].
This commit adds a --nvidia option, which injects a library into the program under measurement, which records entry and exit into CUDA kernels via CUPTI
We might think about bumping the CMake requirement to 3.24 with this version, as older FindCUDAToolkit.cmake fail to correctly detect CUPTI headers[1].
This implements #294
[1] https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7608