oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
715 stars 112 forks source link

[PROTOTYPE] Two-pass scan - Generalize work-group size support #1638

Closed mmichel11 closed 2 weeks ago

mmichel11 commented 2 weeks ago

This PR is targeted to the prototype branch for the reduce-then-scan implementation.

Previously, the implementation was reliant on work-group sizes being >= 1024 so that the number of sub-groups is divisible by the sub-group size. This PR generalizes the work-group size requirement and allows for the case where the number of sub-groups is not divisible by the sub-group size. There is a small performance penalty when this is the case.

Additionally, support has been added for performing sub-group scans when we do not have a full sub-group size of input via the num_remaining parameter and avoids the need to pad with identity elements.

danhoeflinger commented 2 weeks ago

We still need to come up with some strategy for "filling" elements because we need some constructed element in the shift_group_right call. Since we are not "using" the shifted value, we can possibly fill it with "garbage" or the last valid element in the subgroup. I'm thinking about how to do this.

mmichel11 commented 2 weeks ago

We still need to come up with some strategy for "filling" elements because we need some constructed element in the shift_group_right call. Since we are not "using" the shifted value, we can possibly fill it with "garbage" or the last valid element in the subgroup. I'm thinking about how to do this.

Yeah, this PR does not resolve this part of the issue which makes it still dependent on the identity. Currently, it relies on the previous value of whatever v was when it is unused in the scan which is initially set to the identity.