pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.63k stars 1.99k forks source link

shift with default value on a struct does not fill all the fields #14456

Open robinvd opened 9 months ago

robinvd commented 9 months ago

Checks

Reproducible example

import polars as pl

df = pl.DataFrame(
    {
        "x": [0, 1, 2],
        "y": [9, 8, 7],
    },
)
df = df.select(data=pl.struct(x=pl.col("x"), y=pl.col("y")))
print(df)

df = df.with_columns(prev=pl.col("data").shift(1, fill_value=0))
print(df)

# df = df.with_columns(prev=pl.col("prev").fill_null(1))
# print(df) # same output

output

shape: (3, 1)
┌───────────┐
│ data      │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,9}     │
│ {1,8}     │
│ {2,7}     │
└───────────┘
shape: (3, 2)
┌───────────┬───────────┐
│ data      ┆ prev      │
│ ---       ┆ ---       │
│ struct[2] ┆ struct[2] │
╞═══════════╪═══════════╡
│ {0,9}     ┆ {0,null}  │
│ {1,8}     ┆ {0,9}     │
│ {2,7}     ┆ {1,8}     │
└───────────┴───────────┘

Log output

(empty)

Issue description

When doing a shift on a struct the default value is not applied to all fields

Expected behavior

Both fields in the struct should be set to the default value 0

Installed versions

``` 1 zsh ❯ python -c 'import polars; polars.show_versions()' --------Version info--------- Polars: 0.20.7 Index type: UInt32 Platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35 Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: gevent: 23.7.0 hvplot: matplotlib: 3.7.2 numpy: 1.24.4 openpyxl: pandas: 1.5.3 pyarrow: 13.0.0 pydantic: 1.10.12 pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
DeflateAwning commented 3 months ago

Now (on v1.4.1), this code actually raises an exception. Would love to see this as a test case in the repo!

robinvd commented 2 months ago

Can indeed be closed i think. But how would you shift the whole struct now :thinking:

cmdlineluser commented 2 months ago

Good question.

This panics:

df.with_columns(prev=pl.col("data").shift(1, fill_value=pl.struct(x=0, y=0)))
# thread '<unnamed>' panicked at crates/polars-core/src/utils/mod.rs:915:5:
# PanicException: expected arrays of the same length