Timing measurement in a SYCL kernel

jinz2014 commented 6 months ago

auto now = sycl::std::chrono::system_clock::now();
auto now_ms = sycl::std::chrono::time_point_cast<cuda::std::chrono::milliseconds>(now);
auto epoch = now_ms.time_since_epoch();

Reference auto now = cuda::std::chrono::system_clock::now(); auto now_ms = cuda::std::chrono::time_point_cast(now); auto epoch = now_ms.time_since_epoch();

https://nvidia.github.io/cccl/libcudacxx/standard_api/time_library/chrono.html

dkhaldi commented 5 months ago

@jinz2014, please add a more detailed a description. What is the issue and what are you requesting to be added/fixed?

jinz2014 commented 5 months ago

7.13. Time Function (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function) clock_t clock(); long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle. Sampling this counter at the beginning and at the end of a kernel, taking the difference of the two samples, and recording the result per thread provides a measure for each thread of the number of clock cycles taken by the device to completely execute the thread, but not of the number of clock cycles the device actually spent executing thread instructions. The former number is greater than the latter since threads are time sliced.

Example: https://github.com/zchee/cuda-sample/blob/master/0_Simple/clock/clock.cu

jinz2014 commented 5 months ago

Additional timing function in a CUDA kernel.

__global__ static void sleep_kernel() {
   // __nanosleep function is not defined in SYCL. This is a hardware-specific feature. Consult with your hardware vendor to find a replacement.
  __nanosleep(1);
}

intel / llvm

Timing measurement in a SYCL kernel #12959