tusharchou / local-data-platform

python library for iceberg lake house on your local
MIT License
7 stars 5 forks source link

0.1.2 Implement Partitioning and Version Control #23

Open tusharchou opened 3 weeks ago

tusharchou commented 3 weeks ago

You can optimize the table for queries by partitioning it based on relevant fields such as block_timestamp or signer_account_id. This will improve query performance by reducing the amount of data scanned. For partitioning:

from pyiceberg.partitioning import PartitionSpec

Define partition spec

partition_spec = PartitionSpec.builder_for(schema) \ .identity("block_timestamp") \ .build()

Create partitioned table

transactions_table = catalog.create_table( identifier="near.transactions", schema=schema, partition_spec=partition_spec )

tusharchou commented 4 days ago

@mrutunjay-kinagi will you work on this next ?

mrutunjay-kinagi commented 4 days ago

Will take a look at it.