It may be possible to reduce the number of atomics in the scan kernel template algorithm. Currently, we use atomic loads/stores for both the status flags and status values, but it might be possible to only use atomics for the flags and rely on the standard happens-before relations to guarantee that the values would be updated.
It may be possible to reduce the number of atomics in the scan kernel template algorithm. Currently, we use atomic loads/stores for both the status flags and status values, but it might be possible to only use atomics for the flags and rely on the standard happens-before relations to guarantee that the values would be updated.
Discussed in https://github.com/oneapi-src/oneDPL/pull/1320#pullrequestreview-2019240324.