Closed fcharras closed 1 year ago
OUT of WIP. Will merge since it's almost identical to what was approved in https://github.com/soda-inria/sklearn-numba-dpex/pull/88
The performance issue came from assuming the wrong alignment for grid of work items, SYCL uses row-major order, but cuda, numba.cuda and numba_dpex all use column-major order.
The fix consisted in transposing the 2D work group size and inversing the gpex.get_*_id
calls.
It's WIP because the benchmark show that those changes induce a 30% performance overhead, there seem to be an issue with 2D grid of work items in
numba_dpex
orSYCL
.Opened an issue https://github.com/IntelPython/numba-dpex/issues/941 TODO: find minimal reproducers.
This PR remains opened as WIP in case those issues are fixed eventually.
Before considering merging the PR, please rebase the branch and check if new kernels on main branch could also benefit from the change.