oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
715 stars 112 forks source link

[PROTOTYPE] Two-pass scan - Support in-place exclusive scans and update tests #1647

Closed mmichel11 closed 1 week ago

mmichel11 commented 2 weeks ago

This PR is targeted to the development branch for a new scan implementation.

I have done some evaluation of the scan implementation for the in-place case where the input and output buffers are the same. For inclusive scan, there are no issues with the current implementation. For exclusive scan, we have an issue in the multi-block case where we fetch the last element from the previous block's input to compute the current block's carry-in. In the in-place case, this input value would have been already overwritten with the exclusive scan value.

To fix this issue, the last input element in a block is written to temporary storage by the last sub-group's work item in the second kernel before the scan results are written and is then fetched by the next block. This operation does not add any additional overhead as the last sub-group would be idle either way waiting on the first sub-group to finish its work.

I have added a test-case for out-of-place and in-place inclusive / exclusive scan to the USM tests. The test build / run time is starting to grow, so I don't think we should add them to each case.

mmichel11 commented 1 week ago

@adamfidel Was there any additional review you wanted to do here?