pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.1k stars 1.83k forks source link

Better explanation for low_memory #10974

Open braaannigan opened 1 year ago

braaannigan commented 1 year ago

Problem description

I have often seen people struggling with memory issues (in polars and pandas) who see the low_memory argument and hope it will resolve their issue.

I think we should explain in the docstrings a little more what low_memory actually does and point to streaming as a better solution

mdavis-xyz commented 7 months ago

point to streaming as a better solution

But streaming functions (like scan_csv) have low_memory arguments. So they aren't mutually exclusive, are they?

(My use case is that I'm trying to deduplicate a file larger than memory. Even though the query planner says it's a streaming operation, the full dataset gets loaded into memory to sort it. So streaming isn't always streaming.)