Consider adding sorting intrinsics.

halide / Halide

a language for fast, portable data-parallel computation

https://halide-lang.org

Other

5.9k stars 1.07k forks source link

Consider adding sorting intrinsics. #7258

Open zvookin opened 1 year ago

zvookin commented 1 year ago

This PR on libcxx updates small size sorting networks builtin to the library. (It is based on improvements found via ML techniques.) I believe up to size 5 is supported and only for integer types.

We could add intrinsics for each of a set of small sizes, a variadic intrinsic, or one that takes some sort of Func expression and a constant size and does the sort. Or perhaps some combination.

This is still fixed size. We could generalize to variable size with a fixed upper limit. And in that model adding more intrinsics to handle variable length tasks might make sense, but fixed size sorting networks is likely a lot less design.

abadams commented 1 year ago

Maybe I'm missing something, but I'm a bit dubious of that PR. It seems that the only difference between it and the already-known optimal sorting networks is the partially_sorted_swap function, which could be implemented using four min/max instructions but is instead implemented using two comparisons and four ternary operators. I think it only makes sense for architectures without a min/max instruction for your desired comparison operator. If it successfully compiles down to min and max instructions then, well, the optimal sorting networks for these sizes are already known without any RL.

abadams commented 1 year ago

We could totally add small sort functions based on the known optimal sorting networks though. I have a table of them somewhere, along with small find nth-of-m networks.

abadams commented 1 year ago

Looking my notes, https://pages.ripco.net/~jgamble/nw.html is where I got my known-optimal networks up to size 16

Google search also turns up https://www.angelfire.com/blog/ronz/Articles/999SortingNetworksReferen.html but it has a bug in one of its networks!

zvookin commented 1 year ago

The PR is mostly referenced to demonstrate that libcxx is doing something in this area, not necessarily saying we should use those networks. (I am assuming the sort[2345] functions are exported but maybe they're internal only. Also a question as to how vectorization is handled.) I suppose if the optimal network depends on a lot of context, it gets trickier.

Is there any reason not to add these? It seems like doing them as generators or library functions would be less productive than as intrinsics. There is a question of how much use they'd get, but small sorts do show up in image processing and ML stuff.

abadams commented 1 year ago

We could add a sort function to IROperator.cpp/h that takes and returns a vector of Exprs and just constructs the appropriate mess of min/max nodes. That scales reasonably for the sorts of sizes people are likely to use.