Open wence- opened 1 year ago
We also have a related problem in the C++ layer, where we sort indices and then gather in places like segmented_sort_by_key_common
. At the libcudf level, we have to consider nulls and nontrivial types (where a gather is needed for handling those) but the same general idea applies: sort directly wherever possible. https://github.com/rapidsai/cudf/pull/13669#discussion_r1268276705
During review of #13419 we noted a few places where there is a pattern like:
As well as the unnecessary bounds-check (see #13456), this is a pattern that is captured by libcudf's
sort_by_key
andstable_sort_by_key
functions (we would want to use the latter in pandas-compat mode).At present, libcudf implements this as a
argsort
of the key columns followed by a gather. But that's an implementation detail (there may in the future be updates to that implementation). In the Python layer we should "say what we mean" and call into the appropriate libcudf API.A cursory search shows:
Of these, the calls in
_base_index.py
,_column.py
, andgroupby.py
can definitely be replaced bysort_by_key
. Note also that none of these calls passcheck_bounds=False
totake
so incur an unnecessary kernel launch to check in-boundsness for something that is guaranteed in bounds.The
take(argsort().argsort())
pattern is not asort_by_key
, however, we can elide one of the argsorts by noticing thattake
is a gather operation and for a permutation, the dual to gather is scatter. So this should be implemented asdf.scatter(index.argsort())
instead...These are just the cases where an argsort is immediately followed by a take, probably more diligent searching would find more.