Open tanhevg opened 2 weeks ago
You can wrap expressions with pl.struct()
to prevent this.
.over(pl.struct(pl.col('a', 'b'))) # or pl.struct('a', 'b')
There was an issue about .over(regex)
having the same problem https://github.com/pola-rs/polars/issues/12858
But there was no official answer as to whether or not it is a bug.
This isn't a bug. This expands to multiple expressions:
[
pl.col(i").first().over(pl.col('a')),
pl.col(i").first().over(pl.col('b')),
]
Why does it work then when passing columns as strings? For group_by
, both ...('a','b')
and ...(pl.col('a','b'))
work, and return identical results. This is very confusing. The expansion given above will always throw.
Whether it is a bug or not depends on the definition of a 'bug'. IMHO this is a sort of API inconsistency that either needs to be corrected, or must be explained at length in the docs and people will still keep stumbling upon it anyway.
In group_by
it also expands. In a with_columns
aan expansion may not lead to duplicates.
We have documented expression expansion in our user guide.
Checks
Reproducible example
Setup:
This works:
This crashes: (
pl.col('a', 'b')
instead of'a', 'b'
)Error:
Log output
No response
Issue description
A
pl.col()
expression with multiple columns passed to 'over' causes a name duplication error. Passing the same columns as string arguments works.Expected behavior
Expecting a
pl.col()
expression to yield identical results to passing columns as strings.Installed versions