Open borchero opened 1 week ago
similar to this one https://github.com/pola-rs/polars/issues/14936.
the tldr of that one is if min=max and null_count=0 then don't read any data and just propagate the one known value.
@stinodego happy to try contributing this if you can point me to some documentation on where to touch code when augmenting the projection pushdown logic 🫣
Description
When saving a dataframe via
write_parquet("...", use_statistics=True)
, I would expectto read only the column statistics from the parquet file. However, judging from execution time and memory consumption, all of the data is read.
Interestingly, this issue even applies to simpler properties that are available in the parquet metadata, e.g.
Would it be possible to push down relevant operations when statistics are available?