rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.43k stars 904 forks source link

[BUG] Allow `sort_values` to accept same `kind` values as Pandas #8911

Closed sarahyurick closed 3 years ago

sarahyurick commented 3 years ago

For compatibility reasons, we should accept all of the kind values that Pandas does in DataFrame.sort_values, but not necessarily use them. Right now, doing

import cudf
a = [0,1,2]
b = [-3, 2, 0]
df = cudf.DataFrame()
df["a"] = a
df["b"] = b
df.sort_values(by='b')

uses Quicksort by default. When I try it with a different sorting algorithm, I get:

df.sort_values(by='b', kind='mergesort')
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/tmp/ipykernel_59179/3536691751.py in <module>
----> 1 df.sort_values(by='b', kind='mergesort')

~/miniconda3/envs/cudf_dev/lib/python3.8/site-packages/cudf/core/dataframe.py in sort_values(self, by, axis, ascending, inplace, kind, na_position, ignore_index)
   3907         if kind != "quicksort":
   3908             print(kind)
-> 3909             raise NotImplementedError("`kind` not currently implemented.")
   3910         if axis != 0:
   3911             raise NotImplementedError("`axis` not currently implemented.")

NotImplementedError: `kind` not currently implemented.

Pandas currently allows kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’, as found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html.

beckernick commented 3 years ago

Closed by https://github.com/rapidsai/cudf/pull/8912