mrpowers-io / levi

Delta Lake helper methods. No Spark dependency.
MIT License
21 stars 8 forks source link

Add function on total size of Delta table #2

Open MrPowers opened 1 year ago

MrPowers commented 1 year ago

This should return the number of bytes in the Delta table

puneetsharma04 commented 1 year ago

Hello @MrPowers : I tried developing the code in order to fulfil this requirement. Could you please check the below code and let me know if that is the thing that you are looking for.

from pyspark.sql.functions import sum
from delta.tables import DeltaTable

def get_delta_table_size(path):
    delta_table = DeltaTable.forPath(spark, path)
    size_in_bytes = delta_table.history().select(sum('size')).collect()[0][0]
    return size_in_bytes

table_size = get_delta_table_size('/path/to/my/delta/table')
print(f"The size of the Delta table is {table_size} bytes.")