[SYCL][CUDA] reduce overhead of kernel execution

intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

Other

1.21k stars 724 forks source link

[SYCL][CUDA] reduce overhead of kernel execution #7351

Open zjin-lcf opened 1 year ago

zjin-lcf commented 1 year ago

Total execution time of the CUDA and SYCL programs is 1.04 s and 1.86 s on an Nvidia GPU with sm_86, respectively. The profiler shows that the execution time of the SYCL kernel is similar to that of the CUDA kernel on the device. So the overhead of executing the SYCL kernel is significant in this case.

https://github.com/zjin-lcf/HeCBench/tree/master/src/reverse-cuda https://github.com/zjin-lcf/HeCBench/tree/master/src/reverse-sycl

0x12CC commented 2 months ago

@zjin-lcf, could you please share a reproducer for this slowdown? The two links you have don't seem to work anymore.

zjin-lcf commented 2 months ago

I updated the links. Thanks.