oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

Make Unique family of APIs use reduce_then_scan #1765

Closed danhoeflinger closed 1 week ago

danhoeflinger commented 1 month ago

This PR changes the unique family of scan-like APIs to use reduce_then_scan when it is beneficial.

This PR allows us to remove __pattern_scan_copy functions because they are no longer used.
We have moved the algorithm decisions to go through at the level of __parallel_[copy_if/partition/unique]_copy, so we no longer need the "scan_copy" at the pattern level.

Moves all algorithm selection decisions to __parallel_unique_copy, and unifies range API to also use this function. This allows us to unify the algorithmic selection, and provide performance improvements to the ranges API.

Unique requires some constexpr special casing in the kernel to allow us to avoid an extra branch for each element in the _GenMask for unique to avoid underflow when index == 0. We special case the kernels to skip and always copy the 0th element for unique family APIs, and start the scan at element 1. This allows us to handle this copy of the 0th element without any additional kernel launches. The n==1 case is handled specially with a simple copy call.


This PR is targeted to #1764, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:

1769 [MERGED] Relocate __lazy_ctor_storage to utils header

1770 [MERGED] Use __result_and_scratch_storage within scan kernels

1762 Add reduce_then_scan algorithm for transform scan family

1763 Make Copy_if family of APIs use reduce_then_scan algorithm

1764 Make Partition family of APIs use reduce_then_scan algorithm

1765 Make Unique family of APIs use reduce_then_scan algorithm (This PR)

This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.