rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.38k stars 896 forks source link

[FEA] Support Polars `top_k` expression (column reduction) #16223

Open beckernick opened 3 months ago

beckernick commented 3 months ago

The top_k expression is essentially an nlargest + limit operation and is sometimes used to short circuit sort + limit scenarios common in SQL. E.g., SELECT * FROM table ORDER BY col DESC LIMIT 10 and the equivalent in DataFrame operations.

import polars as pl
from functools import partial
from cudf_polars.callback import execute_with_cudf

use_cudf = partial(execute_with_cudf, raise_on_fail=True) # for testing

df = pl.LazyFrame(
    {
        'a': [1,1,2,2,3,3],
        'b':[1,2,3,4,5,6],
    }
)

print(df.select(pl.col("b").top_k(1)).collect())
print(df.select(pl.col("b").top_k(1)).collect(post_opt_callback=use_cudf))
shape: (1, 1)
┌─────┐
│ b   │
│ --- │
│ i64 │
╞═════╡
│ 6   │
└─────┘
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[32], line 15
      7 df = pl.LazyFrame(
      8     {
      9         'a': [1,1,2,2,3,3],
     10         'b':[1,2,3,4,5,6],
     11     }
     12 )
     14 print(df.select(pl.col("b").top_k(1)).collect())
---> 15 print(df.select(pl.col("b").top_k(1)).collect(post_opt_callback=use_cudf))

File [/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py:1942](http://10.117.23.184:8882/lab/tree/raid/nicholasb/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py#line=1941), in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1939 # Only for testing purposes atm.
   1940 callback = _kwargs.get("post_opt_callback")
-> 1942 return wrap_df(ldf.collect(callback))

ComputeError: 'cuda' conversion failed: NotImplementedError: Unary function name='top_k'

(Related but separate from #16222)

wence- commented 3 months ago

Since (for now) there are no partial_sort/quickselect algorithms in CUB/thrust (see https://github.com/NVIDIA/cccl/issues/931 and https://github.com/NVIDIA/cccl/issues/685), this needs to be implemented via full sort + slice.

This is best done by (rather than implementing top_k as a special-case in the UnaryFunction handler) translating a call to top_k into Sort + slice (note we need to implement slice for that too, but that's relatively straightforward).

Polars actually doesn't guarantee that the returned values are sorted, just that they are the top_k, so that gives freedom in the implementation if/when something is implemented in CCCL