Open deanm0000 opened 2 months ago
The same happens with enum columns.
Some numbers from a real live example:
What I am reading is a ~100Mb Parquet representing a ~30Gb in memory frame. The filter matches no rows.
In the first case, the "Assumption" column has been written as string column, in the second case as categorical column. The data is residing in a Google Cloud bucket in Singapore, this means I/O is costly.
As we see, the missing push-down has a dramatic effect.
Checks
Reproducible example
Log output
Issue description
This is separate from but highly related to https://github.com/pola-rs/polars/issues/18867. Even when using a file written by pyarrow where the statistics are correct, predicate pushdown doesn't work.
If I try to explicitly make the rhs a Categorical then I simply don't get a verbose message at all so I'm not sure if it's silently working or not working.
Even with a StringCache still no verbosity.
Expected behavior
Partition pruning should work
Installed versions