pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.91k stars 1.93k forks source link

`.struct.field('*')` PanicException when used after `.list.to_struct()` #17092

Open cmdlineluser opened 4 months ago

cmdlineluser commented 4 months ago

Checks

Reproducible example

import polars as pl

pl.select(pl.lit('a.b.c').str.split('.').list.to_struct().struct.field('*'))

Log output

thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/conversion/expr_expansion.rs:362:22:
index out of bounds: the len is 0 but the index is 0

Issue description

I'm not sure if this can actually work in a single context due to .list.to_struct() having an "unknown schema".

But the PanicException can be fixed.

Using a separate context works as expected.

>>> pl.select(pl.lit('a.b.c').str.split('.').list.to_struct()).select(pl.all().struct.field('*'))
shape: (1, 3)
┌─────────┬─────────┬─────────┐
│ field_0 ┆ field_1 ┆ field_2 │
│ ---     ┆ ---     ┆ ---     │
│ str     ┆ str     ┆ str     │
╞═════════╪═════════╪═════════╡
│ a       ┆ b       ┆ c       │
└─────────┴─────────┴─────────┘

Expected behavior

No panic exception.

Installed versions

``` --------Version info--------- Polars: 1.0.0-beta.1 Index type: UInt32 Platform: macOS-13.6.1-arm64-arm-64bit Python: 3.12.2 (main, Feb 6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: 1.26.4 openpyxl: pandas: 2.2.1 pyarrow: 15.0.2 pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
Crypto-Spartan commented 3 weeks ago

I have this same exact bug