soda-inria / sklearn-numba-dpex

Experimental plugin for scikit-learn to be able to run (some estimators) on Intel GPUs via numba-dpex.
BSD 3-Clause "New" or "Revised" License
15 stars 4 forks source link

ENH Use a 2D grid of work items where applicable #98

Closed fcharras closed 1 year ago

fcharras commented 1 year ago

It's WIP because the benchmark show that those changes induce a 30% performance overhead, there seem to be an issue with 2D grid of work items in numba_dpex or SYCL.

Opened an issue https://github.com/IntelPython/numba-dpex/issues/941 TODO: find minimal reproducers.

This PR remains opened as WIP in case those issues are fixed eventually.

Before considering merging the PR, please rebase the branch and check if new kernels on main branch could also benefit from the change.

fcharras commented 1 year ago

OUT of WIP. Will merge since it's almost identical to what was approved in https://github.com/soda-inria/sklearn-numba-dpex/pull/88

The performance issue came from assuming the wrong alignment for grid of work items, SYCL uses row-major order, but cuda, numba.cuda and numba_dpex all use column-major order.

The fix consisted in transposing the 2D work group size and inversing the gpex.get_*_id calls.