oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

Add reduce then scan algorithm for transform scan family #1762

Closed danhoeflinger closed 1 week ago

danhoeflinger commented 1 month ago

This PR Adds a generic reduce_then_scan algorithm for scan-like algorithms, and uses this algorithm for transform scan family of scan-like algorithms where it is beneficial.

This PR is targeted after a pair of other PRs #1769 + #1770, which lay some plumbing for these changes. Those two PRs were separated from this one to provide more focus to this change.

It is important to understand that subsequent PRs will add other families of scan-like algorithms (copy_if, partition, unique, ...), which may justify some aspects of the implementation of the reduce_then_scan kernels that could seem overcomplicated for the transform scan family alone.


This PR is targeted to #1770, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:

1769 [MERGED] Relocate __lazy_ctor_storage to utils header

1770 [MERGED] Use __result_and_scratch_storage within scan kernels

1762 Add reduce_then_scan algorithm for transform scan family (This PR)

1763 Make Copy_if family of APIs use reduce_then_scan algorithm

1764 Make Partition family of APIs use reduce_then_scan algorithm

1765 Make Unique family of APIs use reduce_then_scan algorithm

This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.