posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.88k stars 70 forks source link

epic: Handle nested data in polars columns #122

Open machow opened 10 months ago

machow commented 10 months ago

Polars supports nested data---such as lists and structs---in columns of data.

Here's an example from the polars guide

import polars as pl

url = "https://theunitedstates.io/congress-legislators/legislators-historical.csv"

dtypes = {
    "first_name": pl.Categorical,
    "gender": pl.Categorical,
    "type": pl.Categorical,
    "state": pl.Categorical,
    "party": pl.Categorical,
}

dataset = pl.read_csv(url, dtypes=dtypes).with_columns(
    pl.col("birthday").str.to_date(strict=False)
)

q = (
    dataset.lazy()
    .group_by("first_name")
    .agg(
        pl.count(),
        pl.col("gender"),
        pl.first("last_name"),
    )
    .sort("count", descending=True)
    .limit(5)
)

df = q.collect()
print(df)
┌────────────┬───────┬───────────────────┬───────────┐
│ first_name ┆ count ┆ gender            ┆ last_name │
│ ---        ┆ ---   ┆ ---               ┆ ---       │
│ cat        ┆ u32   ┆ list[cat]         ┆ str       │
╞════════════╪═══════╪═══════════════════╪═══════════╡
│ John       ┆ 1256  ┆ ["M", "M", … "M"] ┆ Walker    │
│ William    ┆ 1022  ┆ ["M", "M", … "M"] ┆ Few       │
│ James      ┆ 714   ┆ ["M", "M", … "M"] ┆ Armstrong │
│ Thomas     ┆ 454   ┆ ["M", "M", … "M"] ┆ Tucker    │
│ Charles    ┆ 439   ┆ ["M", "M", … "M"] ┆ Carroll   │
└────────────┴───────┴───────────────────┴───────────┘

Note that each entry in the gender column is a list of strings. However, I don't think Great Tables is set up to handle this situation.

Current Behavior

from great_tables import GT

GT(df).render("html")
ComputeError: cannot cast List type (inner: 'Categorical(Some(global))', to: 'Utf8')
machow commented 8 months ago

Note that List columns can have a schema like List[List[int]], so we need some approach that can handle lists of lists (of lists) etc.. It appears that structs are straightforward to coerce (although a struct can have a list in it).

Is there a good polars approach for casting List[List[int]] -> String?