Closed fcharras closed 1 year ago
Out of WIP. See the top level post for the list of changes.
Something is wrong with sum(axis=1)
on CPU when work_group_size
is >=8
. Maybe another bug in the JIT or something we don't understand about group sizes on CPU (or a bit of both). In both cases the investigation looks complicated and CPU is not the priority target so the latest commit propose forcing work_group_size == 8
with work_group_size
was set to max
.
I'll also run this branch on the dev cloud with the flex170 to see if it shows any performance improvement, and play with the group size to see if it affects performance more than it seems to do on local iGPU.
I've reduced the last failure on the pipeline to what appears to be a JIT issue. The merge will be blocked until it's fixed. It's likely the same JIT issue we've seen before and it gave a much nicer reproducer. See https://github.com/IntelPython/numba-dpex/issues/906 .
TY for the careful review @ogrisel @jjerphan the last commit should address your suggestions and I've answered some of your comments.
Perfect. :+1:
I leave @ogrisel merge.
The tests are still red. Ok for merge once fixed.
The tests fail because of https://github.com/IntelPython/numba-dpex/issues/906 we can merge as soon as it's fixed and we can bump
If there's a conflict with other branches before that we can merge except the diff on the 2d sum kernel.
shape
centroids_private_copies_max_cache_occupancy
and improve the heuristic usingdevice.max_compute_units
It's WIP because using 2D groups for the sum over axis 0 kernel seems to trigger weirg bugs like in https://github.com/IntelPython/numba-dpex/issues/892 in some of the tests.Out of WIP, remaining issues were unrelated, PR is green.
edit: confirmed affected by https://github.com/IntelPython/numba-dpex/issues/906