Closed damnMeddlingKid closed 2 years ago
I believe similar ideas about using statistics outside of the CBO have been discussed in the past. We want to avoid using stats for query results or pruning or anything outside CBO because it carries the risk of wrong results. We also want to avoid adding features behind toggles that are not generally safe to use. You can look at using iceberg or delta instead where we can safely use file level min/max statistics to prune splits.
Thanks for the clarification!.
I'd like to propose a feature to allow utilizing per partition column statistics (like min/max) to filter out partitions that don't overlap with the predicate. If partitions don't contain statistics they would not be filtered out.
This behaviour can be controlled by a session variable or table property so that it is only enabled for tables where the user knows that the statistics are fresh and accurate.
I believe it would be beneficial in ETL pipelines where partitions are not updated after they are inserted.