oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
722 stars 113 forks source link

A one group version of merge_sort #1113

Open MikeDvorskiy opened 1 year ago

MikeDvorskiy commented 1 year ago
          I think its close, and there are some decent opportunities for short term improvement.

In the long term, a one group version of this could be a big improvement for small sizes of n, shrinking down to a single kernel launches from O(log(n)) kernel launches where we would be in a single or only a few workgroups anyway.

Originally posted by @danhoeflinger in https://github.com/oneapi-src/oneDPL/pull/1098#pullrequestreview-1594106537

danhoeflinger commented 1 year ago

To the same goal of future performance improvement of merge sort: It seems worthwhile to investigate leaf size changes based on total input size to avoid the extra copy from temporary data to output data, originally from a comment from @dmitriy-sobolev https://github.com/oneapi-src/oneDPL/pull/1098#discussion_r1305549820

danhoeflinger commented 1 year ago

Additionally, it would be good to re-investigate selection of leaf size and its effect on performance. I believe when this was looked at previously, the number of iterations of the merge phase was not being adjusted accordingly. I expect this should provide some benefit, especially in the context of avoiding the extra copy from temporary data.

dmitriy-sobolev commented 2 months ago

Leaf size was enlarged as a part of https://github.com/oneapi-src/oneDPL/pull/1735. The performance benefit was significant for GPUs. However, the question regarding adjustment of a leaf size to avoid extra copy still remains open.