Closed schmelczer closed 3 days ago
@nameexhaustion I think we can remove the low-memory
reader in polars-pipe
. It seems to be wrong, and it is also slow as we memmove the bytes multiple times. I think the mmap
reader is already low-memory as we memmap.
WDYT?
I think we can remove it for now. Later we may revisit some of the approaches the low memory reader uses depending on how we read bytes from cloud storage for async.
Checks
Reproducible example
The above prints an unexpectedly empty dataframe:
Log output
Issue description
I'd like to read a large CSV (with 10 small columns) using
scan_csv
in streaming mode. This script is meant to run in a resource-constrained environment, so I setlow_memory=True
, however, this results in no rows being read. The schema is still correctly inferred but the returned DataFrame contains 0 rows. Settinglow_memory
toFalse
solves the problem.Expected behavior
I'd expect to get the same dataframe regardless low_memory is
True
orFalse
.Rerunning the above example with
low_memory=False
:produces the result is what we'd expect:
Installed versions