rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.28k stars 884 forks source link

[FEA] Support cross-casting to/from strings in cudf-polars #16479

Open wence- opened 1 month ago

wence- commented 1 month ago

In polars converting from string to integer (say) and vice versa is accomplished in an expression by casting between types. For example pl.col("a").cast(pl.String).

In libcudf, this is possible, but must be carried out by appropriate cudf::strings conversion routines instead of calling cudf::cast.

Since casting like this is often used as part of data ingest pipelines, we should consider supporting it.

beckernick commented 2 weeks ago

@wence- @brandon-b-miller , this may now be resolved?

import polars as pl

df = pl.LazyFrame({
    "a": [0,1,2]
})
print(df.select(pl.col("a").cast(pl.String)).collect(engine="gpu"))

df = pl.LazyFrame({
    "a": ["0", "1", "3"]
})
print(df.select(pl.col("a").cast(pl.Int32)).collect(engine="gpu"))
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ str │
╞═════╡
│ 0   │
│ 1   │
│ 2   │
└─────┘
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i32 │
╞═════╡
│ 0   │
│ 1   │
│ 3   │
└─────┘

EDIT: Nope, forgot about CPU fallback ;)