pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.1k stars 1.83k forks source link

panic on forward/backward fill on struct #15782

Closed jr200 closed 1 month ago

jr200 commented 4 months ago

Checks

Reproducible example

import os
# os.environ["RUST_BACKTRACE"]="full"
os.environ["POLARS_VERBOSE"]="1"

import polars as pl

(
    pl.DataFrame([
        pl.Series("A", [1]),
        pl.Series("B", ["X"]),
        pl.Series("s", [{"u": 'y'}], dtype=pl.Struct({"u": pl.Categorical})),
    ])
    .with_columns(pl.col("s").forward_fill().over("B"))
)

Log output

PanicException                            Traceback (most recent call last)
Cell In[5], line 13
      3 os.environ["POLARS_VERBOSE"]="1"
      5 import polars as pl
      7 (
      8     pl.DataFrame([
      9         pl.Series("A", [1]),
     10         pl.Series("B", ["X"]),
     11         pl.Series("s", [{"u": 'y'}], dtype=pl.Struct({"u": pl.Categorical})),
     12     ])
---> 13     .with_columns(pl.col("s").forward_fill().over("B"))
     14 )

File ~/code/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py:7874, in DataFrame.with_columns(self, *exprs, **named_exprs)
   7728 def with_columns(
   7729     self,
   7730     *exprs: IntoExpr | Iterable[IntoExpr],
   7731     **named_exprs: IntoExpr,
   7732 ) -> DataFrame:
   7733     """
   7734     Add columns to this DataFrame.
   7735 
   (...)
   7872     └─────┴──────┴─────────────┘
   7873     """
-> 7874     return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)

File ~/code/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py:1708, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
   1705 if background:
   1706     return InProcessQuery(ldf.collect_concurrently())
-> 1708 return wrap_df(ldf.collect())

PanicException: called `Option::unwrap()` on a `None` value

Issue description

Panic when trying to forward_fill() or backward_fill a struct column.

interpolate also panics, but i'm not sure what a sensible behaviour would be. I get a different error when I remove the .over() expression in the snippet above.

Expected behavior

Forward/Backward fill should propagate the non-null value forward or backward.

Installed versions

``` --------Version info--------- Polars: 0.20.21 Index type: UInt32 Platform: macOS-14.1.1-x86_64-i386-64bit Python: 3.11.3 (main, Mar 12 2024, 20:00:56) [Clang 15.0.0 (clang-1500.3.9.4)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: hvplot: 0.9.2 matplotlib: nest_asyncio: 1.6.0 numpy: 1.26.4 openpyxl: pandas: 2.2.1 pyarrow: 15.0.2 pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
cmdlineluser commented 4 months ago

Can reproduce.

There seems to be 2 issues alright - the current error:

thread 'polars-2' panicked at crates/polars-core/src/series/from.rs:330:25:
called `Option::unwrap()` on a `None` value

If we remove the Categorical dtype, we get the fill_null not implemented for struct error:

thread 'polars-3' panicked at crates/polars-core/src/chunked_array/ops/fill_null.rs:91:18:
not yet implemented
stevebuildboats commented 1 month ago

relates to #17830