tusharchou / local-data-platform

python library for iceberg lake house on your local
MIT License
8 stars 5 forks source link

0.1.2 Implement Partitioning and Version Control #23

Open tusharchou opened 1 month ago

tusharchou commented 1 month ago

You can optimize the table for queries by partitioning it based on relevant fields such as block_timestamp or signer_account_id. This will improve query performance by reducing the amount of data scanned. For partitioning:

from pyiceberg.partitioning import PartitionSpec

Define partition spec

partition_spec = PartitionSpec.builder_for(schema) \ .identity("block_timestamp") \ .build()

Create partitioned table

transactions_table = catalog.create_table( identifier="near.transactions", schema=schema, partition_spec=partition_spec )

tusharchou commented 1 month ago

@mrutunjay-kinagi will you work on this next ?

mrutunjay-kinagi commented 1 month ago

Will take a look at it.

tusharchou commented 3 weeks ago

@rakhioza07 as spoken over call can you help groom this issue for a better PR

tusharchou commented 1 day ago

I will pick this up