tusharchou / local-data-platform

python library for iceberg lake house on your local
MIT License
8 stars 5 forks source link

0.1.1 Insert Data into Iceberg #21

Closed tusharchou closed 1 month ago

tusharchou commented 1 month ago

After you have the data from BigQuery and an Iceberg table ready, you can insert the dataset into Iceberg for storage.

Load the Iceberg table

from pyiceberg.table import Table table = catalog.load_table("near.transactions")

Write data into Iceberg (converting Pandas DataFrame to PySpark DataFrame)

import pyspark spark = pyspark.sql.SparkSession.builder.appName("IcebergApp").getOrCreate() spark_df = spark.createDataFrame(transactions_df)

Append the data to the Iceberg table

table.new_append(spark_df).commit()

tusharchou commented 1 month ago

26 https://github.com/tusharchou/local-data-platform/blob/b00fe293acec0c4c5fb116c8f6ae9c2e3df1caf9/local-data-platform/nyc_yellow_taxi.py#L76