Open wence- opened 9 months ago
If this issue is still open and relevant, I would like to give it a try @wence- Thank you!
Hi @MananDoshi1301 - we are in the process of refactoring our Cython internals. This effort is described here:
https://github.com/rapidsai/cudf/issues/13921
So while you're touching sort.pyx
, I wonder if it's worth also just refactoring it out into the new pylibcudf
package. Is that something you would be interested in taking on?
Happy to answer any questions you might have here, or on the RAPIDS Slack!
Yes absolutely, I'd love to do it. Thanks for the heads up @shwina!
Apologies - it looks like sort
was ported over to pylibcudf as part of https://github.com/rapidsai/cudf/pull/15011.
If you're interested in contributing, many of the open Python issues should be quite approachable:
https://github.com/rapidsai/cudf/issues?q=is%3Aopen+is%3Aissue+label%3A%22cuDF+%28Python%29%22
If anything piques your interest, please let us know!
Where libcudf search/sort functions accept a
table_view
as input, one must specify the order (ascending or descending) and null precedence (beginning or end) as astd::vector
of length equal to the number of columns in thetable_view
(or else an empty such vector to used libcudf's defaults).In the cudf cython wrapping of these functions we are, in contrast, inconsistent in the way we handle these arguments. Some functions accept a list for both order and null precedence (e.g.
libcudf.sort.sort
); some accept a list for order but only a single value for null precedence (e.g.libcudf.sort.order_by
); some accept a list for neither order nor null precedence (e.g.libcudf.search.search_sorted
).This should be cleaned up and all such functions should uniformly accept a list for both order and null precedence. Higher-level python functions that operate on single columns and call the table interfaces should be responsible for any argument munging.