pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.13k stars 1.83k forks source link

Panic when creating simple nested dataframe with numpy #17531

Open coastalwhite opened 2 months ago

coastalwhite commented 2 months ago

Checks

Reproducible example

import polars as pl
import numpy as np

arr2 = np.random.randint(0, 32, size=(10, 1))
arr2 = np.append(arr2, [[None]], axis=0)
df = pl.DataFrame({ 'x': arr2 }, schema={'x': pl.List(pl.Int8)})

Log output

No response

Issue description

Creating this simple dataframe will always give an error with fixedsizelists

thread '<unnamed>' panicked at crates/polars-core/src/series/ops/reshape.rs:159:26:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("FixedSizeListArray's child's DataType must match. However, the expected DataType is Unknown while it got FixedSizeBinary(8)."))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/johndoe/Projects/polars/t.py", line 12, in <module>
    df = pl.DataFrame({ 'x': arr2 }, schema={'x': pl.List(pl.Int8)})
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/dataframe/frame.py", line 360, in __init__
    self._df = dict_to_pydf(
               ^^^^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/_utils/construction/dataframe.py", line 159, in dict_to_pydf
    for s in _expand_dict_values(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/_utils/construction/dataframe.py", line 388, in _expand_dict_values
    updated_data[name] = pl.Series(
                         ^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/series/series.py", line 300, in __init__
    self._s = numpy_to_pyseries(
              ^^^^^^^^^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/_utils/construction/series.py", line 465, in numpy_to_pyseries
    return wrap_s(py_s).reshape(original_shape)._s
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johndoe/Projects/polars/py-polars/polars/series/series.py", line 6790, in reshape
    return self._from_pyseries(self._s.reshape(dimensions))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("FixedSizeListArray's child's DataType must match. However, the expected DataType is Unknown while it got FixedSizeBinary(8)."))

Expected behavior

No panic

Installed versions

``` --------Version info--------- Polars: 1.1.0 Index type: UInt32 Platform: Linux-6.6.32-x86_64-with-glibc2.39 Python: 3.11.9 (main, Apr 2 2024, 08:25:04) [GCC 13.2.0] ----Optional dependencies---- adbc_driver_manager: 0.11.0 cloudpickle: 3.0.0 connectorx: 0.3.3 deltalake: 0.17.4 fastexcel: fsspec: 2024.3.0 gevent: 24.2.1 great_tables: hvplot: 0.9.2 matplotlib: 3.8.4 nest_asyncio: 1.6.0 numpy: 1.26.4 openpyxl: 3.1.2 pandas: 2.2.1 pyarrow: 16.0.0 pydantic: 2.6.3 pyiceberg: sqlalchemy: 2.0.30 torch: xlsx2csv: 0.8.2 xlsxwriter: 3.2.0 ```
coastalwhite commented 1 month ago

Another, more minimal, example:

import polars as pl
import numpy as np

arr = np.array([[None]])
df = pl.DataFrame({ 'x': arr })