oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
714 stars 110 forks source link

Optimize atomic operations in scan kernel template #1528

Open adamfidel opened 2 months ago

adamfidel commented 2 months ago

It may be possible to reduce the number of atomics in the scan kernel template algorithm. Currently, we use atomic loads/stores for both the status flags and status values, but it might be possible to only use atomics for the flags and rely on the standard happens-before relations to guarantee that the values would be updated.

Discussed in https://github.com/oneapi-src/oneDPL/pull/1320#pullrequestreview-2019240324.

danhoeflinger commented 1 month ago

I agree that the values here shouldn't need to be atomics, and the atomics for the status flags can provide everything we need.