Open stinodego opened 1 year ago
Hi, does the team have a plan to support this? In a lot of cases, when parsing empty json columns from DB, the function panics.
Hi, does the team have a plan to support this? In a lot of cases, when parsing empty json columns from DB, the function panics.
@sibarras Could you give a reproducible example of that panic?
Hi, does the team have a plan to support this? In a lot of cases, when parsing empty json columns from DB, the function panics.
@sibarras Could you give a reproducible example of that panic?
Sure, using sqlite, when you read a json column, it gets parsed as a str on polars. Then when you try to cast this to a struct, we got a panic.
from sqlite3 import connect
import polars as pl
def main():
with connect(":memory:") as con:
df = pl.read_database(
"SELECT JSON('{}') as json_col;", con
) # it works fine, but it's parsed as a string
print(df)
df.select(pl.col("json_col").str.json_decode()) # panics here
if __name__ == "__main__":
main()
This is the output using Python 3.9.18 on WSL2.
shape: (1, 1)
┌──────────┐
│ json_col │
│ --- │
│ str │
╞══════════╡
│ {} │
└──────────┘
thread 'python' panicked at crates/polars-arrow/src/array/struct_/mod.rs:117:52:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("a StructArray must contain at least one field"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/home/samuel_e_ibarra/coding/python/targeting_lib/example.py", line 15, in <module>
main()
File "/home/samuel_e_ibarra/coding/python/targeting_lib/example.py", line 11, in main
df.select(pl.col("json_col").str.json_decode()) # panics here
File "/home/samuel_e_ibarra/coding/python/targeting_lib/.venv/lib/python3.9/site-packages/polars/dataframe/frame.py", line 8124, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
File "/home/samuel_e_ibarra/coding/python/targeting_lib/.venv/lib/python3.9/site-packages/polars/lazyframe/frame.py", line 1943, in collect
return wrap_df(ldf.collect())
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("a StructArray must contain at least one field"))
I looked into this and empty structs just don't really make much sense. An empty struct column would have to behave somewhat like a Null
column as it doesn't contain any Series/values.
We should probably first address https://github.com/pola-rs/polars/issues/3462 before implementing this.
str.json_decode
should either error or return a Null column here. I will make a separate issue for that.
The empty struct also creates issues in read_ndjson
and json_decode
:
Polars already handles empty structs, but in an inconsistent way. And the inconsistency causes panic exceptions in more complex situations.
import io
import polars as pl
frame = pl.read_ndjson(io.StringIO('{"id": 1, "empty_struct": {}, "list_of_empty_struct": [{}]}'))
print(frame)
for col_name, col_type in frame.schema.items():
print(f'{col_name:>20} {col_type}')
Output:
shape: (1, 3)
┌─────┬──────────────┬──────────────────────┐
│ id ┆ empty_struct ┆ list_of_empty_struct │
│ --- ┆ --- ┆ --- │
│ i64 ┆ struct[1] ┆ list[struct[0]] │
╞═════╪══════════════╪══════════════════════╡
│ 1 ┆ {null} ┆ [] │
└─────┴──────────────┴──────────────────────┘
id Int64
empty_struct Struct({'': Null})
list_of_empty_struct List(Struct({}))
The expected type of the "empty_struct" column would be pl.Struct({})
, but it is pl.Struct({pl.Field('', pl.Null)})
.
I have requirement to create a empty struct in the dataFrame and later i would like to add / rename the fields using struct.with_fields.
But , i was not able to create a empty struct, when i try to create like : pl.struct([]) it is like empty literal.
Any recent approaches ?
After https://github.com/pola-rs/polars/pull/18249 we now get:
>>> pl.Series(dtype=pl.Struct)
shape: (0,)
Series: '' [struct[1]]
[
]
Although it produces struct[1]
instead of struct[0]
which I'm not sure about.
>>> pl.Series(dtype=pl.Struct).dtype
Struct({'': Null})
Problem description
Although perhaps not extremely useful, we should allow structs without any fields for the sake of consistency.
In the current behaviour, Polars conjures up a single unnamed field of type
Null
:Trying to create an empty struct through the
struct
expression results in a PanicException:Desired behaviour would be: