Open sd2k opened 1 year ago
Because we don't have a streaming ndjson reader yet. PR for this would be very welcome.
I'd love to contribute this. Currently trying to understand what's different between CSV/JSON. https://github.com/pola-rs/polars/pull/4382 added a LazyJsonReader
and scan_ndjson
which seems like a lazy/streaming ndjson reader, but clearly I'm misunderstanding somewhere. It looks like a new source might need adding to polars-pipe
?
If there is no streaming ndjson reader, does scan_ndjson
have a purpose in its current state? I ran into this when trying to convert a large ndjson file to parquet too.
If there is no streaming ndjson reader, does
scan_ndjson
have a purpose in its current state? I ran into this when trying to convert a large ndjson file to parquet too.
From the docs: allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.
So unfortunately this doesn't help when trying to do the scan -> sink full file conversion.
Problem description
The following code gives a (slightly confusing) error:
This diff has an xfailing test, as well as a test showing that the same thing works fine for
scan_csv
.I'd like to try and fix this but don't really know why
sink_parquet
isn't supported for a LazyFrame read usingscan_ndjson
🤔