posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.42k stars 48 forks source link

`Polars` expressions unsupported for column selection #340

Closed jrycw closed 1 month ago

jrycw commented 1 month ago

It appears that while we currently support column selection using Polars selectors, Polars expressions are not yet supported. Is this intentional? Should we consider adding support for Polars expressions as well?

from great_tables import GT
import polars as pl
import polars.selectors as cs
from great_tables.data import exibble

lil_exibble = exibble[["num", "char", "fctr", "date", "time"]].head(4)
pl_df = pl.from_pandas(lil_exibble)

# ok
GT(pl_df).cols_move_to_start(columns=cs.starts_with("char"))

# TypeError: Unsupported selection expr type: <class 'polars.expr.expr.Expr'>
GT(pl_df).cols_move_to_start(columns=pl.col("char"))
jrycw commented 1 month ago

I'm curious if we could implement this feature by changing the line from https://github.com/posit-dev/great-tables/blob/7eae1f269a7d1f787de7a2c9198d775933794fd5/great_tables/_tbl_data.py#L322 to if not isinstance(expr, (list, cls_selector, pl.Expr)):.

machow commented 1 month ago

Hey, thanks for raising this -- I think this was intentional, but supporting expressions seems fair to consider.

I think the rationale was:

However, since...

I could see how people might want to use pl.col()? It seems like where selection should occur was pretty actively discussed a while back (https://github.com/pola-rs/polars/issues/13757; see https://github.com/pola-rs/polars/issues/13757#issuecomment-1903966834 for good language on selection vs expression, etc..).

It might be a good idea to wait a bit and see where the API is a year from now? IMO it's helpful that requiring a selector nudges people away from expressions in column selection. But I could see why pl.col() is convenient (as evidenced by that issue).

As a slightly related point, I noticed that cs.expand_selector() actually returns the resulting column names. We use it to select columns, so this behavior could cause issues for us. I opened an issue in case it's a bug.

https://github.com/pola-rs/polars/issues/16242

jrycw commented 1 month ago

@machow, I'll close the issue for now and revisit it when the API from Polars is stable. Perhaps after Polars v1.0 is released might be a good timing.