rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.24k stars 884 forks source link

[BUG] Interchange `Column.dtype` returns format strings in NumPy-style, instead of Arrow-style #11389

Open honno opened 2 years ago

honno commented 2 years ago

In the interchange protocol, Column.dtype should return an Arrow-style format string, but instead a NumPy-styled one is returned

>>> df = cudf.DataFrame({"foo": cudf.Series([0, 1], dtype="int8")})
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_col.dtype
(<_DtypeKind.INT: 0>, 8, '|i1', '|')  # 3rd element (format string) should be "c"

It looks like currently the .str attribute of the dtype objects (i.e. np.dtype(...)) is returned as-is

https://github.com/rapidsai/cudf/blob/edc5062bdcc3e12755603b0ad07a4d271fe95261/python/cudf/cudf/core/df_protocol.py#L260

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.