oneapi-src / SYCLomatic

Other
222 stars 91 forks source link

[SYCLomatic] [stable_]partition performance improvement #2215

Closed danhoeflinger closed 1 week ago

danhoeflinger commented 1 month ago

Performance improvement for in-place partition and stable_partition.

This allows us to avoid the extra copy of the mask, and reduce the number of kernels. We rely internally upon [stable_]partition_copy now instead of stable_partition which internally does a partion_copy followed by a copy. This allows us to avoid writing to the mask list without allocating a tmp mask buffer and copying to it.

The change for partition reverses the "false" list, so it breaks stability. Stability is not required for this API though, and we can skip one copy kernel this way.

It would be possible to write a specialized sycl copy kernel which copies two different sequences into an output, but for now we just call copy twice.

Requires https://github.com/oneapi-src/SYCLomatic-test/pull/752 to be merged first for passing tests (relaxing stability requirement of partition test).