rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.23k stars 883 forks source link

[FEA] Remove inconsistencies in cython wrappers when handling order/null-precedence #14492

Open wence- opened 9 months ago

wence- commented 9 months ago

Where libcudf search/sort functions accept a table_view as input, one must specify the order (ascending or descending) and null precedence (beginning or end) as a std::vector of length equal to the number of columns in the table_view (or else an empty such vector to used libcudf's defaults).

In the cudf cython wrapping of these functions we are, in contrast, inconsistent in the way we handle these arguments. Some functions accept a list for both order and null precedence (e.g. libcudf.sort.sort); some accept a list for order but only a single value for null precedence (e.g. libcudf.sort.order_by); some accept a list for neither order nor null precedence (e.g. libcudf.search.search_sorted).

This should be cleaned up and all such functions should uniformly accept a list for both order and null precedence. Higher-level python functions that operate on single columns and call the table interfaces should be responsible for any argument munging.

MananDoshi1301 commented 7 months ago

If this issue is still open and relevant, I would like to give it a try @wence- Thank you!

shwina commented 6 months ago

Hi @MananDoshi1301 - we are in the process of refactoring our Cython internals. This effort is described here:

https://github.com/rapidsai/cudf/issues/13921

So while you're touching sort.pyx, I wonder if it's worth also just refactoring it out into the new pylibcudf package. Is that something you would be interested in taking on?

Happy to answer any questions you might have here, or on the RAPIDS Slack!

MananDoshi1301 commented 6 months ago

Yes absolutely, I'd love to do it. Thanks for the heads up @shwina!

shwina commented 6 months ago

Apologies - it looks like sort was ported over to pylibcudf as part of https://github.com/rapidsai/cudf/pull/15011.

If you're interested in contributing, many of the open Python issues should be quite approachable:

https://github.com/rapidsai/cudf/issues?q=is%3Aopen+is%3Aissue+label%3A%22cuDF+%28Python%29%22

If anything piques your interest, please let us know!