rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.46k stars 908 forks source link

[FEA] Support cross-casting to/from strings in cudf-polars #16479

Open wence- opened 3 months ago

wence- commented 3 months ago

In polars converting from string to integer (say) and vice versa is accomplished in an expression by casting between types. For example pl.col("a").cast(pl.String).

In libcudf, this is possible, but must be carried out by appropriate cudf::strings conversion routines instead of calling cudf::cast.

Since casting like this is often used as part of data ingest pipelines, we should consider supporting it.

beckernick commented 2 months ago

@wence- @brandon-b-miller , this may now be resolved?

import polars as pl

df = pl.LazyFrame({
    "a": [0,1,2]
})
print(df.select(pl.col("a").cast(pl.String)).collect(engine="gpu"))

df = pl.LazyFrame({
    "a": ["0", "1", "3"]
})
print(df.select(pl.col("a").cast(pl.Int32)).collect(engine="gpu"))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ str │
╞═════╡
│ 0   │
│ 1   │
│ 2   │
└─────┘
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i32 │
╞═════╡
│ 0   │
│ 1   │
│ 3   │
└─────┘

EDIT: Nope, forgot about CPU fallback ;)