oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

[PROTOTYPE] Optimization for unique #1743

Closed danhoeflinger closed 1 month ago

danhoeflinger commented 1 month ago

Optimization for Unique. Avoid branch at every dereference of the input range to specialize for the zeroth element. Ended up uglier than I expected to avoid an "empty" scan.
We may be better off special casing the n=1 case, which would allow us to get rid of a lot of the checks for positive __active_subgroups.

Draft to show for comment. Need another pass for polish and formatting. May want to rethink it a bit as well as mentioned above.

mmichel11 commented 1 month ago

The general approach looks good to me here. I'm still trying to think if there's anything else we can do to reduce the amount of special cases we have throughout the code.

Do you have a rough idea of how much performance benefit this brings us? It'll help in deciding how much special casing unique is worth it.

danhoeflinger commented 1 month ago

The general approach looks good to me here. I'm still trying to think if there's anything else we can do to reduce the amount of special cases we have throughout the code.

Do you have a rough idea of how much performance benefit this brings us? It'll help in deciding how much special casing unique is worth it.

No not yet. I'll check some performance numbers here though and update, probably offline.

danhoeflinger commented 1 month ago

To summarize the offline discussion, this improves performance of unique patterns up to ~16% without harming performance of other scan patterns.