Closed danhoeflinger closed 1 month ago
The general approach looks good to me here. I'm still trying to think if there's anything else we can do to reduce the amount of special cases we have throughout the code.
Do you have a rough idea of how much performance benefit this brings us? It'll help in deciding how much special casing unique is worth it.
The general approach looks good to me here. I'm still trying to think if there's anything else we can do to reduce the amount of special cases we have throughout the code.
Do you have a rough idea of how much performance benefit this brings us? It'll help in deciding how much special casing unique is worth it.
No not yet. I'll check some performance numbers here though and update, probably offline.
To summarize the offline discussion, this improves performance of unique patterns up to ~16% without harming performance of other scan patterns.
Optimization for Unique. Avoid branch at every dereference of the input range to specialize for the zeroth element. Ended up uglier than I expected to avoid an "empty" scan.
We may be better off special casing the
n=1
case, which would allow us to get rid of a lot of the checks for positive__active_subgroups
.Draft to show for comment. Need another pass for polish and formatting. May want to rethink it a bit as well as mentioned above.