pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.59k stars 1.89k forks source link

Reordering columns using ellipsis #12067

Open balakhaniyan opened 11 months ago

balakhaniyan commented 11 months ago

Description

Reordering columns is sometimes very useful, but need some extra code in Polars and other Data Frame libraries for just 2 or 3 columns. But using Ellipsis make it easy and tricky for many usages...

df = pl.DataFrame(schema=dict(
    fname=pl.Utf8,
    lname=pl.Utf8,
    age=pl.UInt8,
    favColor = pl.Utf8,
    money = pl.Decimal,
    workhour = pl.Float64,
))
df.select(pl.col(['lname', 'workhour', ...]))

Means lname and workhour columns must be at first and others be next to them. Or we can also use [..., 'lname', 'workhour'], but it is less usable in my opinion. It is a tricky but pythonic way for reordering.

cmdlineluser commented 11 months ago

This has sort of come up a few times previously https://github.com/pola-rs/polars/issues/10394

You can do a basic version by collecting each .meta_output_name() and replacing ... with pl.exclude(names_seen)

df.select('lname', 'workhour', ...)
# shape: (0, 6)
# ┌───────┬──────────┬───────┬─────┬──────────┬────────────┐
# │ lname ┆ workhour ┆ fname ┆ age ┆ favColor ┆ money      │
# │ ---   ┆ ---      ┆ ---   ┆ --- ┆ ---      ┆ ---        │
# │ str   ┆ f64      ┆ str   ┆ u8  ┆ str      ┆ decimal[0] │
# ╞═══════╪══════════╪═══════╪═════╪══════════╪════════════╡
# └───────┴──────────┴───────┴─────┴──────────┴────────────┘

df.select(..., 'lname', 'workhour')
# shape: (0, 6)
# ┌───────┬─────┬──────────┬────────────┬───────┬──────────┐
# │ fname ┆ age ┆ favColor ┆ money      ┆ lname ┆ workhour │
# │ ---   ┆ --- ┆ ---      ┆ ---        ┆ ---   ┆ ---      │
# │ str   ┆ u8  ┆ str      ┆ decimal[0] ┆ str   ┆ f64      │
# ╞═══════╪═════╪══════════╪════════════╪═══════╪══════════╡
# └───────┴─────┴──────────┴────────────┴───────┴──────────┘

df.select('lname', ..., 'workhour')
# shape: (0, 6)
# ┌───────┬───────┬─────┬──────────┬────────────┬──────────┐
# │ lname ┆ fname ┆ age ┆ favColor ┆ money      ┆ workhour │
# │ ---   ┆ ---   ┆ --- ┆ ---      ┆ ---        ┆ ---      │
# │ str   ┆ str   ┆ u8  ┆ str      ┆ decimal[0] ┆ f64      │
# ╞═══════╪═══════╪═════╪══════════╪════════════╪══════════╡
# └───────┴───────┴─────┴──────────┴────────────┴──────────┘

I'm not sure if it's possible to build a proper pl.exclude() for all possible non-ellipsis inputs though.

alexander-beedie commented 11 months ago

I will poke at this a bit and see if it's realistic to implement... it's certainly one of the more intuitive/reasonable syntax suggestions that I've seen so far for this sort of reordering, though it probably won't interact cleanly with selectors. Or perhaps this could actually be a new cs.remaining() selector (requiring some new internal logic), which Ellipsis gets replaced with, hmm...🤔