Closed dmitriy-sobolev closed 4 weeks ago
I've found that the changes result in incorrect sort on Nvidia GPUs. I am debugging it.
Upd. It's been fixed in https://github.com/oneapi-src/oneDPL/pull/1735/commits/d92c9de5a6347571d8993ceffb11cfd82838dc9e.
@dmitriy-sobolev , in this PR you have two calls of depends_on
: mandatory required to capture their params by value (now it captured by reference so it's will work incorrectly).
UPD: already fixed.
@dmitriy-sobolev , in this PR you have two calls of
depends_on
: mandatory required to capture their params by value (now it captured by reference so it's will work incorrectly).
Thanks. I've fixed it.
@dmitriy-sobolev please take a look to the code like this:
const auto& __rng1 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset);
const auto& __rng2 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset + __n1);
1) &
isn't required here;
2) Really we create the new instance of drop_view_simple
in this code, so I propose to rewrite it:
const oneapi::dpl::__ranges::drop_view_simple __rng1(__dst, __offset);
const oneapi::dpl::__ranges::drop_view_simple __rng2(__dst, __offset + __n1);
There are four places...
@dmitriy-sobolev please take a look to the code like this:
const auto& __rng1 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset); const auto& __rng2 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset + __n1);
&
isn't required here;- Really we create the new instance of
drop_view_simple
in this code, so I propose to rewrite it:const oneapi::dpl::__ranges::drop_view_simple __rng1(__dst, __offset); const oneapi::dpl::__ranges::drop_view_simple __rng2(__dst, __offset + __n1);
There are four places...
Done
Is it critical for performance that data-per-workitem is a template parameter of __leaf_sorter
and __group_merge_path_sorter
, instead of a runtime parameter? I am concerned about the proliferation of kernel instantiations, and the only reason why leaf sorters have different C++ type is that parameter.
Is it critical for performance that data-per-workitem is a template parameter of
__leaf_sorter
and__group_merge_path_sorter
, instead of a runtime parameter? I am concerned about the proliferation of kernel instantiations, and the only reason why leaf sorters have different C++ type is that parameter.
No it is not. As far as I remember the performance impact is marginal. Probably, it is better to get rid of it for now.
Process more items at the leaf-sort stage. It substantially improves the performance on GPU devices.
The proposed implementation is the most genetic since it sorts the data stably, and relies only on the move-assignability and move-copyability of the type (according to the sort algorithm requirements). It can be specialized in the future.