mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

Eager partition filtering. Record filters. #343

Closed mjakubowski84 closed 9 months ago

mjakubowski84 commented 9 months ago
  1. Introduces experimental RecordFilter which allows filtering Parquet records based on their index in the file.
  2. Series of improvements in a listing partitioned directory. The most prominent change is an eager filtering of the partition directory during the traversal of the directory tree. Thanks to that we can avoid redundant listings of directories wich do not match the filter.