rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.41k stars 899 forks source link

[BUG] Can't repr interchange buffers which don't have supported dlpack dtypes #11320

Closed honno closed 1 year ago

honno commented 2 years ago

One can't repr an interchange protocol boolean buffer

>>> df = cudf.DataFrame({"foo": [True]})
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_buf = interchange_col.get_buffers()["data"][0]
>>> interchange_buf
Traceback (most recent call last)

    File .../cudf/core/df_protocol.py:100, in _CuDFBuffer.__dlpack__(self)
        99     cudarray = as_cuda_array(self._buf).view(self._dtype)
    --> 100     res = cp.asarray(cudarray).toDlpack()
        102 except ValueError:

    File cupy/_core/core.pyx:1919, in cupy._core.core.ndarray.toDlpack()

    File cupy/_core/core.pyx:1957, in cupy._core.core.ndarray.toDlpack()

    File cupy/_core/dlpack.pyx:138, in cupy._core.dlpack.toDlpack()

ValueError: Unknown dtype

During handling of the above exception, another exception occurred:

Traceback (most recent call last)

    ...

    File .../cudf/core/df_protocol.py:118, in _CuDFBuffer.__repr__(self)
        113 def __repr__(self) -> str:
        114     return f"{self.__class__.__name__}(" + str(
        115         {
        116             "bufsize": self.bufsize,
        117             "ptr": self.ptr,
    --> 118             "dlpack": self.__dlpack__(),
        119             "device": self.__dlpack_device__()[0].name,
        120         }
        121     )
        122     +")"

    File .../cudf/core/df_protocol.py:103, in _CuDFBuffer.__dlpack__(self)
        100     res = cp.asarray(cudarray).toDlpack()
        102 except ValueError:
    --> 103     raise TypeError(f"dtype {self._dtype} unsupported by `dlpack`")
        105 return res

TypeError: dtype bool unsupported by `dlpack`

as _CuDFBuffer.__repr__() always uses __dlpack__() (and __dlpack_device__()), which won't work for all dtypes.

https://github.com/rapidsai/cudf/blob/edc5062bdcc3e12755603b0ad07a4d271fe95261/python/cudf/cudf/core/df_protocol.py#L113-L122

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

honno commented 1 year ago

This looks to be fixed, so closing :slightly_smiling_face:

>>> df = cudf.DataFrame({"foo": [True]})
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_buf = interchange_col.get_buffers()["data"][0]
>>> interchange_buf
_CuDFBuffer({'bufsize': 1, 'ptr': 140494990475776, 'device': 'CUDA'}