oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

Make Copy_if family of APIs use reduce_then_scan algorithm #1763

Closed danhoeflinger closed 1 week ago

danhoeflinger commented 1 month ago

This PR adds copy_if family to use reduce_then_scan algorithm where it is beneficial.

Moves all algorithm selection decisions to __parallel_copy_if, and unifies range API to also use this function. Adding support for an assignment operator for single work group copy_if. This allows us to unify the algorithmic selection, and provide performance improvements to the ranges API.

Adds __parallel_reduce_then_scan_copy function which will be used by partition and unique in the future to take advantage of shared functionality.


This PR is targeted to #1762, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:

1769 [MERGED] Relocate __lazy_ctor_storage to utils header

1770 [MERGED] Use __result_and_scratch_storage within scan kernels

1762 Add reduce_then_scan algorithm for transform scan family

1763 Make Copy_if family of APIs use reduce_then_scan algorithm (This PR)

1764 Make Partition family of APIs use reduce_then_scan algorithm

1765 Make Unique family of APIs use reduce_then_scan algorithm

This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.