This PR cleans up the code and resolves a few outstanding TODOs:
int loop variables have been changed to std::uint32_t to match oneDPL style.
__ndi.barrier() has been replaced with __dpl_sycl::__group_barrier(__ndi);. The performance concern comments have been removed since this is a known, broader issue in oneDPL and we do not use many barriers.
Some clang-format changes have been applied.
Removed comment on SLM checks. SYCL requires 32 KB minimum SLM which is far higher than what we require in any circumstance.
Only adjust __inputs_per_sub_group / __inputs_per_item on second to last block.
Removed stray comment left in exclusive_scan_by_segment.pass.cpp
This PR cleans up the code and resolves a few outstanding TODOs:
int
loop variables have been changed tostd::uint32_t
to match oneDPL style.__ndi.barrier()
has been replaced with__dpl_sycl::__group_barrier(__ndi);
. The performance concern comments have been removed since this is a known, broader issue in oneDPL and we do not use many barriers.__inputs_per_sub_group / __inputs_per_item
on second to last block.exclusive_scan_by_segment.pass.cpp