pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.25k stars 1.95k forks source link

passing str + list[Expr] to `agg()` causes PanicException #18706

Open cmdlineluser opened 1 month ago

cmdlineluser commented 1 month ago

Checks

Reproducible example

import polars as pl

df = pl.DataFrame({"x": [1, 2], "y": [3, 4]}).with_row_index()

aggs = [pl.col("y")]

df.group_by("x").agg("index", aggs)
# PanicException: called `Result::unwrap()` on an `Err` value: 
# InvalidOperation(ErrString("`list_builder` operation not supported for dtype `object`"))

Log output

No response

Issue description

There is a similar issue for with_columns https://github.com/pola-rs/polars/issues/17413

I originally has a list of agg exprs but was modifying a querying to add an index column and hit this.

The list can be unpacked as a workaround. .agg("index", *aggs)

Expected behavior

No panic.

Installed versions

``` --------Version info--------- Polars: 1.7.0 Index type: UInt32 Platform: macOS-13.6.1-arm64-arm-64bit Python: 3.12.2 (main, Feb 6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: 1.26.4 openpyxl: pandas: 2.2.1 pyarrow: 15.0.2 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
deanm0000 commented 1 month ago

Did it used to work in an older version?

cmdlineluser commented 1 month ago

It doesn't seem so.

Just randomly tried a few versions back to 0.20.10 and it panics there also.

BartSchuurmans commented 1 month ago

The reason .agg(aggs) works is this branch: https://github.com/pola-rs/polars/blob/29cdb17deadf725c8b98bf2bacfde434ba156111/py-polars/polars/_utils/parse/expr.py#L122-L124 which handles the Iterable[IntoExpr] part of the IntoExpr | Iterable[IntoExpr] type.

.agg("index", aggs) does not satisfy this condition (there are 2 inputs), and [pl.col("y")] is converted into an expression representing a literal (list of objects, length 1) value here: https://github.com/pola-rs/polars/blob/29cdb17deadf725c8b98bf2bacfde434ba156111/py-polars/polars/functions/lit.py#L143-L150

Eventually in Rust this causes a panic because the dtype of the list (being a pl.col() object) is not supported.

I suggest adopting the *more_exprs argument style for agg() (and with_columns() and other functions with similar arguments) to remove the ambiguity between literal lists and iterables of IntoExprs: https://github.com/pola-rs/polars/blob/54218e7e35e3defd4b0801e820c56eea6b91e525/py-polars/polars/expr/expr.py#L3377-L3383