pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.91k stars 1.93k forks source link

Regression (0.20.2 -> 0.20.3): failure to initialize null column with nested struct dtypes (`ComputeError`) #13602

Open brendancooley opened 9 months ago

brendancooley commented 9 months ago

Checks

Reproducible example

Null struct columns initialize fine

pl.DataFrame().with_columns(pl.lit(None, pl.Struct([pl.Field("c", pl.Int8)])).alias("struct"))

But attempting to initialize a nested struct raises ComputeError

pl.DataFrame().with_columns(pl.lit(None, pl.Struct([pl.Field("a", pl.Struct([pl.Field("c", pl.Int8)]))])).alias("struct"))

Log output

Traceback (most recent call last):
  File ".../test_model.py", line 543, in <module>
    pl.DataFrame().with_columns(pl.lit(None, pl.Struct([pl.Field("a", pl.Struct([pl.Field("c", pl.Int8)]))])).alias("struct"))
  File ".../.venv/lib/python3.10/site-packages/polars/dataframe/frame.py", line 8235, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
  File ".../.venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1749, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ComputeError: conversion from `null` to `struct[1]` failed in column 'literal' for 0 out of 1 values: []

Issue description

Believe this may be the source of the regression: https://github.com/pola-rs/polars/pull/13326

I will take a closer look and try to isolate the cause

Expected behavior

With polars==0.20.2 the nested initialization works fine, returning

shape: (1, 1)
┌───────────┐
│ struct    │
│ ---       │
│ struct[1] │
╞═══════════╡
│ {{null}}  │
└───────────┘

Installed versions

``` --------Version info--------- Polars: 0.20.3 Index type: UInt32 Platform: macOS-13.6.1-x86_64-i386-64bit Python: 3.10.13 (main, Sep 11 2023, 15:37:46) [Clang 12.0.5 (clang-1205.0.22.11)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: gevent: hvplot: matplotlib: numpy: 1.26.2 openpyxl: pandas: pyarrow: 14.0.2 pydantic: 2.5.3 pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
MarcoGorelli commented 7 months ago

Thanks for the report

from git bisect, this is from #13255

639c4d51fa9ba300c14662c3ceadeefa86cd8e63 is the first bad commit
commit 639c4d51fa9ba300c14662c3ceadeefa86cd8e63
Author: Ritchie Vink <...>
Date:   Wed Dec 27 09:50:02 2023 +0100

    feat: dispatch strict_cast via cast (#13255)

@ritchie46 just FYI

gab23r commented 1 day ago

This do not raise anymore on polars 1.10 and produces the expected results. Can be close I think,