posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.81k stars 60 forks source link

Value based styling for pivoted tables? #389

Open igorcalabria opened 3 months ago

igorcalabria commented 3 months ago

Question

What would you like to know?

Hi, not sure if I've missed something but is there an easy way to apply conditional styles (https://posit-dev.github.io/great-tables/get-started/basic-styling.html) to pivoted tables (ie. Dates as columns)?

I know it's possible to pass a polars column selector in the location body, but I haven´t found a proper way to select rows per column automatically.

Something like this

 tab_style(
        style=style.text(color="white"),
        locations=loc.body(
            columns=cs.numeric(),
            rows=pl.col("each-column-from-the-selector").eq(0)
        )
    )

The alternative is applying styles on individual columns but that's a bit cumbersome when columns are dynamic

jrycw commented 3 months ago

@igorcalabria If I understand correctly, your idea should work as expected.

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

(
    GT(pl.from_pandas(gtcars).head())
    .tab_style(
        style=style.text(color="red"),
        locations=loc.body(
            columns=cs.numeric(),
            rows=pl.col("year").eq(2017.0)
        )
    )
)

image

jrycw commented 3 months ago

Oh, I think I understand what you're looking for now. Maybe something like this:

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

df_mini = pl.from_pandas(gtcars).head()
gt = GT(df_mini)

for column in df_mini.select(cs.numeric()).columns:
    gt = gt.tab_style(
        style=style.text(color="red"),
        locations=loc.body(columns=column, rows=pl.col(column).gt(600)),
    )

gt

image

igorcalabria commented 3 months ago

Yeah that's pretty much it. A closer example is something like this

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

df_mini = pl.from_pandas(gtcars).head().pivot(on="year", index=["trim", "mfr"], values="hp")
gt = GT(df_mini,rowname_col="trim", groupname_col="mfr")
for dt in df_mini.select(cs.numeric()).columns:
    gt = gt.tab_style(
        style=style.text(color="red"),
        locations=loc.body(columns=dt, rows=pl.col(dt).gt(600))
    )
gt

image

My question was if there's a more idiomatic way of doing it. I feel like this kind of pivots on dates are super common (there's one in the examples page).

From a user's point of view, I think the most natural API would be to simply style stuff based based on cell values and not the entire series:

    gt = gt.tab_style(
        style=style.text(color="red"),
        locations=loc.body(columns=cs.numeric(), rows=pl.col("cell_value").gt(600))
    )

I understand that this may be super awkward to implement across different dataframe implementations and that pl.col("cell-value") doesn't make much sense from polars point of view so It can get confusing too

EDIT:

Maybe

    gt = gt.tab_style(
        style=style.text(color="red"),
       # if rows is a lambda expr, call it for each column that matches the selector
        locations=loc.body(columns=cs.numeric(), rows=lambda col: pl.col(col).gt(600))
    )
jrycw commented 3 months ago

Thanks for your feedback. Personally, I'm comfortable calling multiple tab_style() to achieve the desired result, as it seems to be the most idiomatic approach I'm aware of.

machow commented 3 months ago

Hey thanks for flagging this, and for the useful examples. @rich-iannone and I have talked before about potentially adding a loc.body(mask=...) argument, that could handle the multiple cols selected -> those cols transformed situation.

Something like this...

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

df_mini = pl.from_pandas(gtcars).head().pivot(on="year", index=["trim", "mfr"], values="hp")
gt = (
    GT(df_mini,rowname_col="trim", groupname_col="mfr")
    .tab_style(
        style=style.text(color="red"),
        locations=loc.body(mask = cs.numeric().gt(600))
    )
)

Would this cover your usecase?

igorcalabria commented 3 months ago

Yes, that's exactly what I needed. It's also way better than my suggestion, makes a lot of sense to use masks for styling and it fits pretty well with polars(and probably other dataframe) APIs