oneapi-src / oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
Apache License 2.0
720 stars 114 forks source link

merge-sort: reduce the number of kernels to compile #1740

Open dmitriy-sobolev opened 1 month ago

dmitriy-sobolev commented 1 month ago

The PR splits the submitter with 3 kernels into separate submitters. It results in compiling 7 kernels (4 leaf, 2 global, 1 copy) instead of 24 kernels (8 leaf, 8 global, 8 copy: 4x for _LeafSortKernel options, and 2x for _IndexT options).

It implements TODO (see #1735):

  // TODO: split the submitter into multiple ones to avoid extra compilation of kernels
  // - _LeafSortKernel does not need _IndexT
  // - _GlobalSortKernel does not need _LeafDPWI and _LeafWGS
  // - _CopyBackKernel does not need either of them