pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.56k stars 1.62k forks source link

support expressions in `Frame.unique()` #15485

Open baggiponte opened 1 month ago

baggiponte commented 1 month ago

Description

Ciao! @ritchie46 suggested to open an issue. Currently unique() does not support expressions:

data = pl.DataFrame({"a": ["A", "a", "b", "B"]})

# raises: TypeError: argument 'subset': 'Expr' object cannot be converted to 'PyString'
data.unique(pl.col("a").lower())

# this works
data.with_columns(a_lower=pl.col("a").str.to_lowercase()).unique("a_lower").drop("a_lower")

Could unique support expressions?

mkleinbort commented 1 month ago

You could do

data.select(pl.col("a").lower()).unique()

But yea, it'd be nice to support expressions in .unique

baggiponte commented 1 month ago

You could do

data.select(pl.col("a").lower()).unique()

Yes and no: I would like to retain the text as originally formatted, so I would like to remove all lowercase duplicates but still retain the text as originally formatted.