intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.21k stars 727 forks source link

Timing measurement in a SYCL kernel #12959

Open jinz2014 opened 6 months ago

jinz2014 commented 6 months ago
auto now = sycl::std::chrono::system_clock::now();
auto now_ms = sycl::std::chrono::time_point_cast<cuda::std::chrono::milliseconds>(now);
auto epoch = now_ms.time_since_epoch();

Reference auto now = cuda::std::chrono::system_clock::now(); auto now_ms = cuda::std::chrono::time_point_cast(now); auto epoch = now_ms.time_since_epoch();

https://nvidia.github.io/cccl/libcudacxx/standard_api/time_library/chrono.html

dkhaldi commented 5 months ago

@jinz2014, please add a more detailed a description. What is the issue and what are you requesting to be added/fixed?

jinz2014 commented 5 months ago

7.13. Time Function (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function) clock_t clock(); long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle. Sampling this counter at the beginning and at the end of a kernel, taking the difference of the two samples, and recording the result per thread provides a measure for each thread of the number of clock cycles taken by the device to completely execute the thread, but not of the number of clock cycles the device actually spent executing thread instructions. The former number is greater than the latter since threads are time sliced.

Example: https://github.com/zchee/cuda-sample/blob/master/0_Simple/clock/clock.cu

Related post: https://github.com/intel/llvm/issues/8191

jinz2014 commented 5 months ago

Additional timing function in a CUDA kernel.

__global__ static void sleep_kernel() {
   // __nanosleep function is not defined in SYCL. This is a hardware-specific feature. Consult with your hardware vendor to find a replacement.
  __nanosleep(1);
}