Open mcrumiller opened 7 months ago
yes, I was looking for a good solution here but I think there is a "problem" in polars in the agg context which can be solved in two ways:
pl.col(...)
which is a big problem imolist
cast of pl.col(...)
in agg contextThis has often been discussed and I personally think this is a bad idea. This "bug" is just one example
The user should be responsible for bringing the data in the aggregation down to a single line. I think auto casting pl.col
to a list is problematic in many ways, one example is this issue.
The problem is a confusing inconsistency between the select
and agg
context. It is very similar in many ways but then has some quirks like this which makes it confusing. Why not make them identical in terms of user experience?
example:
df = pl.DataFrame({"A": [1, 1, 1], "B": [1, 2, 4]})
df.select(
min=pl.col("B").min(),
max=pl.col("B").max(),
mean=pl.col("B").mean(),
implode=pl.col("B").implode(), # explicit `implode` required!
implode_array=pl.col("B").implode().list.to_array(3), # ".list" namespace available
)
shape: (1, 5)
┌─────┬─────┬──────────┬───────────┬───────────────┐
│ min ┆ max ┆ mean ┆ implode ┆ implode_array │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 ┆ list[i64] ┆ array[i64, 3] │
╞═════╪═════╪══════════╪═══════════╪═══════════════╡
│ 1 ┆ 4 ┆ 2.333333 ┆ [1, 2, 4] ┆ [1, 2, 4] │
└─────┴─────┴──────────┴───────────┴───────────────┘
select
requiers explicit implode
to get a list type and make the ".list" namespace available df.group_by("A").agg(
min=pl.col("B").min(),
max=pl.col("B").max(),
mean=pl.col("B").mean(),
col=pl.col("B"), # imo this should not work! No auto cast to list
implode=pl.col("B").implode(), # this should create a list (not list of list)
implode_array=pl.col("B").implode().list.to_array(3), # this should work like in select
)
shape: (1, 7)
┌─────┬─────┬─────┬──────────┬───────────┬─────────────────┬─────────────────────┐
│ A ┆ min ┆ max ┆ mean ┆ col ┆ implode ┆ implode_array │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64 ┆ list[i64] ┆ list[list[i64]] ┆ list[array[i64, 3]] │
╞═════╪═════╪═════╪══════════╪═══════════╪═════════════════╪═════════════════════╡
│ 1 ┆ 1 ┆ 4 ┆ 2.333333 ┆ [1, 2, 4] ┆ [[1, 2, 4]] ┆ [[1, 2, 4]] │
└─────┴─────┴─────┴──────────┴───────────┴─────────────────┴─────────────────────┘
agg
works like select
in almost everything but then has this weird auto cast to list
type which causes problems as there is only a "magic-list-transformation" but not really that makes the ".list" namespace unavailable select
in requiring explicit implodes
which would be consistent and make the ".list" namespace available immediately
Checks
Reproducible example
Log output
Issue description
When aggregating a column in a
GroupBy
context, thelist
namespace is not accessible.Expected behavior
Should be able to call
.list
namespace functions.Installed versions