zeratax / yacx

Yet Another CudaExecutor - wrapper to easily compile and execute cuda kernels
https://zeratax.github.io/yacx
MIT License
8 stars 4 forks source link

use extra event instead of sum of events #95

Closed zeratax closed 4 years ago

zeratax commented 4 years ago

fixes #50

Not sure if this needs an extra synchronization, see

In addition to the two calls to the generic host time-stamp function myCPUTimer(), we use the explicit synchronization barrier cudaDeviceSynchronize() to block CPU execution until all previously issued commands on the device have completed. Without this barrier, this code would measure the kernel launch time and not the kernel execution time.

https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/

This is not done using events, but we should consider adding possibly another sync event and differentiating between Kernel Launch and Kernel Execution

LukasSiefke commented 4 years ago

Ich glaube noch eine Synchronisation für finish wäre gut, bin mir aber auch nicht ganz sicher