mrpowers-io / jodie

Delta lake and filesystem helper methods
MIT License
49 stars 11 forks source link

Add get getUpdatedPartitions and optimizeUpdatedPartitions #84

Open zeotuan opened 2 months ago

zeotuan commented 2 months ago

add an API similar to what was implemented in https://github.com/mrpowers-io/levi/issues/23 to get recently updated partitions Also add an additional API to perform optimize on delta table using result from the above mentioned API since everyone will most likely perform similar operation to translate them to optimize condition


def getUpdatedPartitions(path: String, startTime: Option[Instant], endTime: Option[Instant]): Array[Map[String, String]]

def optimizeUpdatedPartition(path: String, startTime: Option[Instant], endTime: Option[Instant], zOrderCols: Option[Seq[String]] = None): DataFrame