Total execution time of the CUDA and SYCL programs is 1.04 s and 1.86 s on an Nvidia GPU with sm_86, respectively. The profiler shows that the execution time of the SYCL kernel is similar to that of the CUDA kernel on the device. So the overhead of executing the SYCL kernel is significant in this case.
Total execution time of the CUDA and SYCL programs is 1.04 s and 1.86 s on an Nvidia GPU with sm_86, respectively. The profiler shows that the execution time of the SYCL kernel is similar to that of the CUDA kernel on the device. So the overhead of executing the SYCL kernel is significant in this case.
https://github.com/zjin-lcf/HeCBench/tree/master/src/reverse-cuda https://github.com/zjin-lcf/HeCBench/tree/master/src/reverse-sycl