Open Gattocrucco opened 1 week ago
Seems to happen with any expr that can modify the length: drop_nulls, drop_nans, arg_true, etc.
df.group_by('a').agg((pl.col.b == pl.col.b).arg_true().get(pl.col.c.first()))
# shape: (2, 2)
# ┌─────┬───────────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ list[u32] │
# ╞═════╪═══════════╡
# │ 2 ┆ [0, 0] │
# │ 1 ┆ [0, 0] │
# └─────┴───────────┘
Slicing seems to be another possible workaround:
df.group_by('a').agg(pl.col.b.drop_nulls().slice(pl.col.c.first(), 1).first())
# shape: (2, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 2 ┆ 8 │
# │ 1 ┆ 6 │
# └─────┴─────┘
Checks
Reproducible example
Log output
Issue description
It seems the output is repeated in a list as long as the number of groups, like there was a cross-product.
My code worked fine in a pre-1.0 version of polars.
Expected behavior
df.group_by('a').agg(pl.col('b').drop_nulls().get(pl.len() // 100))
should produce a scalar column, not a list column.Installed versions