pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.13k stars 1.83k forks source link

It is not possible to create Arrays with zero width #16878

Open coastalwhite opened 2 months ago

coastalwhite commented 2 months ago

Checks

Reproducible example

import polars as pl
pl.Series([[]], dtype=pld.Array(pld.Int8, 0))

This gives:

polars.exceptions.ComputeError: not all elements have the specified width 0

Log output

No response

Issue description

It is not possible to instantiate zero-width arrays

Expected behavior

It should be possible to create zero-width Arrays.

Installed versions

``` --------Version info--------- Polars: 0.20.31 Index type: UInt32 Platform: Linux-6.6.32-x86_64-with-glibc2.39 Python: 3.11.9 (main, Apr 2 2024, 08:25:04) [GCC 13.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: 3.0.0 connectorx: 0.3.3 deltalake: 0.17.4 fastexcel: fsspec: 2024.3.0 gevent: 24.2.1 hvplot: 0.9.2 matplotlib: 3.8.4 nest_asyncio: 1.6.0 numpy: 1.26.4 openpyxl: 3.1.2 pandas: 2.2.1 pyarrow: 16.0.0 pydantic: 2.6.3 pyiceberg: pyxlsb: 1.0.10 sqlalchemy: 2.0.30 torch: xlsx2csv: 0.8.2 xlsxwriter: 3.2.0 ```
deanm0000 commented 2 months ago

It should be possible to create zero-width Arrays.

buy why?

stinodego commented 2 months ago

I also ran into this when updating the reshape implementation. NumPy allows Arrays with zero dimensions. When I asked Ritchie, he said to keep this constraint of size > 0 for now. But I guess ideally we would support this. Not really a bug though.

Just to note: if we do support this, the reshape implementation can be updated to allow zero dimensions.

cjackal commented 2 months ago

It should be possible to create zero-width Arrays.

buy why?

If we were to fully support the array counterpart of .list.concat, I guess the length 0 array (acting as the identity) makes sense and will have its virtue for many application. We can do no-op for list counterpart with length 0 lists.

KDruzhkin commented 1 month ago

I also ran into this when updating the reshape implementation. NumPy allows Arrays with zero dimensions.

Cf. https://github.com/pola-rs/polars/issues/16522.