posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.42k stars 48 forks source link

How about implementing `GT.pipe()`? #353

Open jrycw opened 1 month ago

jrycw commented 1 month ago

Currently, we can call the methods of GT multiple times, such as GT.tab_style().tab_style()..., which is a common pattern. However, this approach makes it difficult to programmatically call the method.

Here is an example to illustrate my question:

import polars as pl
from great_tables import GT, html, style, loc
from great_tables.data import towny

towny_mini = pl.from_pandas(towny).head(10)

(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    )
    .tab_header(
        title="The Municipalities of Ontario",
        subtitle="The top 10 highest population density in 2021",
    )
    .tab_stubhead(label="Municipality")
    .fmt_number(columns=["land_area_km2", "density_2021"], decimals=1)
    .cols_label(
        land_area_km2=html("land area, <br>km<sup>2</sup>"),
        density_2021=html("density, <br>people/km<sup>2</sup>"),
    )
    .tab_style(
        style=style.fill(color="lightgray"),
        locations=loc.body(
            columns="land_area_km2",
            rows=pl.col("land_area_km2").eq(pl.col("land_area_km2").max()),
        ),
    )
    .tab_style(
        style=style.fill(color="lightblue"),
        locations=loc.body(
            columns="density_2021",
            rows=pl.col("density_2021").eq(pl.col("density_2021").max()),
        ),
    )
)

In this example, I want to highlight the max value of the land_area_km2 and density_2021 columns with different styles. This requires invoking GT.tab_style() twice.

image

However, if we had GT.pipe(), we could encapsulate the styling logic in a function and pass it to GT.pipe(). Here is a draft concept for the idea:

from typing import Callable

def pipe(gtbl: GT, *callables: Callable[[GT], GT]) -> GT:
    for callable_ in callables:
        gtbl = callable_(gtbl)
    return gtbl

GT.pipe = pipe

def tbl_style(gtbl: GT) -> GT:
    cols = ["land_area_km2", "density_2021"]
    colors = ["lightgray", "lightblue"]
    for col, color in zip(cols, colors):
        gtbl = gtbl.tab_style(
            style=style.fill(color=color),
            locations=loc.body(columns=col, rows=pl.col(col).eq(pl.col(col).max())),
        )
    return gtbl

(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    )
    .tab_header(
        title="The Municipalities of Ontario",
        subtitle="The top 10 highest population density in 2021",
    )
    .tab_stubhead(label="Municipality")
    .fmt_number(columns=["land_area_km2", "density_2021"], decimals=1)
    .cols_label(
        land_area_km2=html("land area, <br>km<sup>2</sup>"),
        density_2021=html("density, <br>people/km<sup>2</sup>"),
    )
    .pipe(tbl_style)
)

With the help of GT.pipe(), we can even pass multiple functions to it. I'm curious whether this is a good idea or if there is an existing pattern that I might have overlooked to achieve this goal.

machow commented 1 month ago

This makes sense to me --- @rich-iannone wdyt?

rich-iannone commented 1 month ago

This is a great idea and would certainly be widely appreciated and used!