[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
In the same directory as birds.csv (attached):
import json
import polars as pl
import sys
original = pl.read_csv("birds.csv")
df = original.filter(pl.col("year") == 2021)
print(f"{len(df)} rows after filtering")
as_json = df.write_json(pretty=True, row_oriented=True)
roundtrip = json.loads(as_json)
print(f"{len(roundtrip)} rows after round trip")
restored = pl.DataFrame(roundtrip)
print(f"{len(restored)} restored rows")
produces:
588 rows after filtering
144 rows after round trip
144 restored rows
To confirm this is an actual problem:
grep ,2021, birds.csv | wc
588 588 34368
I am using Polars 0.20.10 and Python 3.12.1.
Log output
python find_missing_birds.py
avg line length: 58.158203
std. dev. line length: 2.7567902
initial row estimate: 2750
no. of chunks: 8 processed by: 8 threads.
dataframe filtered
Issue description
Performing the equivalent read-filter-convert-roundtrip operation with Pandas 2.2.1 produces the correct result (588 rows).
Expected behavior
The output should be 588 rows. The JSON produced by write_json only includes the first 144 rows. By inspection, I cannot see anything in the dataset that would cause it to stop prematurely: all characters are 7-bit ASCII, and while some num values (the last column of the CSV) are missing, they are well before the point where conversion stops, and the JSON does correctly include null to represent them.
birds.csv
Checks
Reproducible example
In the same directory as
birds.csv
(attached):produces:
To confirm this is an actual problem:
I am using Polars 0.20.10 and Python 3.12.1.
Log output
Issue description
Performing the equivalent read-filter-convert-roundtrip operation with Pandas 2.2.1 produces the correct result (588 rows).
Expected behavior
The output should be 588 rows. The JSON produced by
write_json
only includes the first 144 rows. By inspection, I cannot see anything in the dataset that would cause it to stop prematurely: all characters are 7-bit ASCII, and while somenum
values (the last column of the CSV) are missing, they are well before the point where conversion stops, and the JSON does correctly includenull
to represent them.Installed versions