pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.24k stars 1.95k forks source link

`str.json_decode` hangs on nested JSON structure with null values #18781

Closed nsfinkelstein closed 1 month ago

nsfinkelstein commented 1 month ago

Checks

Reproducible example

import polars as pl
pl.Series([
    '{"a":[{"b":false}]}',
    '{"a":[{"b":null}]}',
    '{"a":null}',
]).str.json_decode()

Log output

No response

Issue description

The above and related code hangs interminably.

This code runs correctly on polars version 1.6.0, with all other versions the same.

Expected behavior

shape: (3,)
Series: '' [struct[1]]
[
        {[{false}]}
        {[{null}]}
        {null}
]

Installed versions

``` --------Version info--------- Polars: 1.7.1 Index type: UInt32 Platform: Linux-5.10.219-208.866.amzn2.x86_64-x86_64-with-glibc2.35 Python: 3.11.9 (main, Sep 8 2024, 07:11:41) [GCC 11.4.0] ----Optional dependencies---- adbc_driver_manager altair cloudpickle 3.0.0 connectorx deltalake fastexcel fsspec 2024.9.0 gevent great_tables matplotlib 3.9.2 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl pandas 2.2.2 pyarrow 17.0.0 pydantic 2.8.2 pyiceberg sqlalchemy 2.0.34 torch 2.3.1+cu121 xlsx2csv xlsxwriter ```
cmdlineluser commented 1 month ago

This segfaults for me with a debug build.

zsh: segmentation fault

If it helps narrow things down, it seems to have started after this PR:

cmdlineluser commented 1 month ago

This is still reproducible as of 1.8.1

Hope the ping is OK @ritchie46 - but it seems like this one may need attention if it is not on the radar already.

ritchie46 commented 1 month ago

Ah, I missed it. But.... I coincidentally encountered and fixed it myself! :D

Fixed in #18887