Closed mmichel11 closed 2 weeks ago
We still need to come up with some strategy for "filling" elements because we need some constructed element in the shift_group_right
call. Since we are not "using" the shifted value, we can possibly fill it with "garbage" or the last valid element in the subgroup. I'm thinking about how to do this.
We still need to come up with some strategy for "filling" elements because we need some constructed element in the
shift_group_right
call. Since we are not "using" the shifted value, we can possibly fill it with "garbage" or the last valid element in the subgroup. I'm thinking about how to do this.
Yeah, this PR does not resolve this part of the issue which makes it still dependent on the identity. Currently, it relies on the previous value of whatever v
was when it is unused in the scan which is initially set to the identity.
This PR is targeted to the prototype branch for the reduce-then-scan implementation.
Previously, the implementation was reliant on work-group sizes being >= 1024 so that the number of sub-groups is divisible by the sub-group size. This PR generalizes the work-group size requirement and allows for the case where the number of sub-groups is not divisible by the sub-group size. There is a small performance penalty when this is the case.
Additionally, support has been added for performing sub-group scans when we do not have a full sub-group size of input via the
num_remaining
parameter and avoids the need to pad with identity elements.