pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.19k stars 1.84k forks source link

`read_json` fails on empty objects #10234

Closed cebaa closed 3 weeks ago

cebaa commented 1 year ago

Checks

Reproducible example

import io
import polars

polars.read_json(io.StringIO('{"j":{}}'))

Issue description

The above code fails with:

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: OutOfSpec("A StructArray must contain at least one field")

Expected behavior

Based on the merged PR https://github.com/pola-rs/polars/pull/6039 I'd assume this would produce an empty struct instead.

Installed versions

``` --------Version info--------- Polars: 0.18.11 Index type: UInt32 Platform: Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.28 Python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] ----Optional dependencies---- adbc_driver_sqlite: cloudpickle: 2.0.0 connectorx: 0.3.1 deltalake: fsspec: 2022.7.1 matplotlib: 3.5.2 numpy: 1.21.5 pandas: 1.4.3 pyarrow: 8.0.0 pydantic: 1.10.2 sqlalchemy: 1.4.39 xlsx2csv: xlsxwriter: ```
cmdlineluser commented 1 year ago

Not sure how relevant it is, but the actual exception comes from arrow2

pl.read_json(b'{"j":{}}')

# thread '<unnamed>' panicked at 
# 'called `Result::unwrap()` on an `Err` value: OutOfSpec("A StructArray must contain at least one field")', 
# /Users/user/.cargo/git/checkouts/arrow2-8a2ad61d97265680/d5c78e7/src/array/struct_/mod.rs:122:52
                                   ^^^^^^

It also happens with empty lists #7355

pl.read_json(b'{"j":[]}')
DaveParr commented 1 year ago

Hit by this too when reading lots of json files from a directory. While this gets fixed is there a way to catch the error effectively? Exception doesn't seem to work?

try:
    pl.read_json(b'{"j":{}}')
except Exception as e:
    print("caught exception")
cmdlineluser commented 1 year ago

@DaveParr Looks like PolarsPanicError does it:

try:
    pl.read_json(b'{"j":{}}')
except pl.PolarsPanicError as e:
    print("caught exception")
naanselmo commented 8 months ago

Appears to not be unique to JSON, converting from Pandas like this:

import pandas as pd
import polars as pl

df = pd.DataFrame({"a": [{}]})
boom = pl.from_pandas(df)

Will cause the same issue.

Could it just be a generic issue when converting any empty structure?

cmdlineluser commented 8 months ago

@naanselmo You can open a new issue for that one.