Open braaannigan opened 1 year ago
point to streaming as a better solution
But streaming functions (like scan_csv
) have low_memory
arguments. So they aren't mutually exclusive, are they?
(My use case is that I'm trying to deduplicate a file larger than memory. Even though the query planner says it's a streaming operation, the full dataset gets loaded into memory to sort it. So streaming
isn't always streaming.)
Problem description
I have often seen people struggling with memory issues (in polars and pandas) who see the
low_memory
argument and hope it will resolve their issue.I think we should explain in the docstrings a little more what low_memory actually does and point to streaming as a better solution