narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
611 stars 91 forks source link

Raise NotImplementedError in `pivot` for cuDF if `pivot_table` is called with `observed=True`, backend is cuDF, and there are any categoricals #1400

Open MarcoGorelli opened 1 week ago

MarcoGorelli commented 1 week ago

cuDF tests for pivot are failing: https://www.kaggle.com/code/marcogorelli/testing-cudf-in-narwhals?scriptVersionId=208170308

I think the simplest fix would be, in

https://github.com/narwhals-dev/narwhals/blob/89b24a5e76727fc2d227fe2e50540b9aa4b4ce78/narwhals/_pandas_like/dataframe.py#L851-L858

to do something like

if self._implementation is Implementation.CUDF and any(x == self._dtypes.Categorical for x in self.schema.values()):
    msg = "`pivot` with Categoricals is not implemented for cuDF backend"
    raise NotImplementedError(msg)
raisadz commented 6 days ago

I started looking into this issue but found that cuDF currently doesn't support list types of columns and index arguments in cudf.DataFrame.pivot.

I opened an issue about it in their repo https://github.com/rapidsai/cudf/issues/17360.