mrpowers-io / levi

Delta Lake helper methods. No Spark dependency.
MIT License
22 stars 8 forks source link

Add function that returns number of bytes skipped and number of files skipped for given predicates #3

Closed MrPowers closed 1 year ago

MrPowers commented 1 year ago

For a given set of predicates, this function should return the total number of bytes skipped and the number of files skipped.

The function could be something like this:

levi.skipped_stats(delta_table, filters=[[('col1', '==', 0), ('col2', '>', 5)]])

# return value
{
  "num_files_skipped": 45,
  "num_bytes_skipped": 12341234
}

This function allows for users to explore the impact of file skipping for the various predicates they apply.

P.S. the actual implementation of this function can be different. Just giving an idea.