Open jinz2014 opened 6 months ago
@jinz2014, please add a more detailed a description. What is the issue and what are you requesting to be added/fixed?
7.13. Time Function (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function) clock_t clock(); long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle. Sampling this counter at the beginning and at the end of a kernel, taking the difference of the two samples, and recording the result per thread provides a measure for each thread of the number of clock cycles taken by the device to completely execute the thread, but not of the number of clock cycles the device actually spent executing thread instructions. The former number is greater than the latter since threads are time sliced.
Example: https://github.com/zchee/cuda-sample/blob/master/0_Simple/clock/clock.cu
Related post: https://github.com/intel/llvm/issues/8191
Additional timing function in a CUDA kernel.
__global__ static void sleep_kernel() {
// __nanosleep function is not defined in SYCL. This is a hardware-specific feature. Consult with your hardware vendor to find a replacement.
__nanosleep(1);
}
Reference auto now = cuda::std::chrono::system_clock::now(); auto now_ms = cuda::std::chrono::time_point_cast(now);
auto epoch = now_ms.time_since_epoch();
https://nvidia.github.io/cccl/libcudacxx/standard_api/time_library/chrono.html