j3-signalroom / apache_flink-kickstarter

Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.
https://linkedin.com/in/jeffreyjonathanjennings
MIT License
1 stars 0 forks source link

In the Java examples, sink to Apache Iceberg tables. #7

Open j3-signalroom opened 1 month ago

j3-signalroom commented 1 month ago

The combination of Apache Flink and Apache Iceberg provides several advantages. Iceberg’s capabilities, including snapshot isolation for reads and writes, the ability to handle multiple concurrent operations, ACID-compliant queries, and incremental reads, enable Flink to perform operations that were traditionally challenging with older table formats. Together, they offer an efficient and scalable platform for processing large-scale data, especially for streaming use cases.

j3-signalroom commented 1 month ago

Find out the latest Iceberg release --- https://iceberg.apache.org/releases/

j3-signalroom commented 1 month ago

https://aws.amazon.com/what-is/hadoop/#:~:text=Apache%20Hadoop%20is%20an%20open,datasets%20in%20parallel%20more%20quickly.

j3-signalroom commented 1 month ago

• Iceberg-flink-runtime-1.16-1.3.0.jar (Iceberg-Flink runtime) • Hadoop-common-2.8.3.jar (Hadoop common classes) • Flink-shaded-hadoop-2-uber-2.8.3-10.0.jar (Hadoop AWS classes) • Bundle-2.20.18.jar (AWS bundled classes)

j3-signalroom commented 1 month ago

Reload all the methods and all the constants except from the deprecated Common class to the KafkaClientPropertiesSource class.

j3-signalroom commented 4 weeks ago

The error message Could not execute CREATE DATABASE indicates that there was an issue when trying to create the database db_example in the Iceberg catalog. This could be due to several reasons such as misconfiguration of the Iceberg catalog, lack of permissions, or connectivity issues.

Here are the steps to troubleshoot and resolve this issue:

Steps to Fix: Check Iceberg Catalog Configuration:

Ensure that the Iceberg catalog is correctly configured in your Flink environment. This includes setting the necessary properties for the catalog. Verify Permissions:

Ensure that the user has the necessary permissions to create databases in the Iceberg catalog. Check Connectivity:

Ensure that the Flink job can connect to the Iceberg catalog. This might involve checking network configurations and ensuring that any required services are running. Update the Code:

Ensure that the database creation statement is correctly formatted and that the catalog is properly set up before executing the SQL statement.

j3-signalroom commented 3 weeks ago

To configure and use the Snowflake catalog in Apache Iceberg, follow these steps:

Step-by-Step Guide Add Dependencies: Ensure you have the necessary dependencies in your project. If you are using Maven, add the following dependencies to your pom.xml:

<dependency>
    <groupId>org.apache.iceberg</groupId>
    <artifactId>iceberg-snowflake</artifactId>
    <version>your-iceberg-version</version>
</dependency>
<dependency>
    <groupId>net.snowflake</groupId>
    <artifactId>snowflake-jdbc</artifactId>
    <version>your-snowflake-jdbc-version</version>
</dependency>
  1. Configure the Snowflake Catalog: You need to configure the Snowflake catalog in your Iceberg configuration. This typically involves setting properties in a configuration file or programmatically.

Example configuration in a properties file:

iceberg.catalog.my_snowflake_catalog.type=snowflake
iceberg.catalog.my_snowflake_catalog.uri=jdbc:snowflake://<account>.snowflakecomputing.com
iceberg.catalog.my_snowflake_catalog.warehouse=<warehouse>
iceberg.catalog.my_snowflake_catalog.database=<database>
iceberg.catalog.my_snowflake_catalog.schema=<schema>
iceberg.catalog.my_snowflake_catalog.user=<username>
iceberg.catalog.my_snowflake_catalog.password=<password>
  1. Initialize the Catalog: Initialize the Snowflake catalog in your application code. Here is an example in Java:
    
    import org.apache.iceberg.Catalog;
    import org.apache.iceberg.catalog.CatalogUtil;
    import org.apache.iceberg.catalog.Namespace;
    import org.apache.iceberg.catalog.TableIdentifier;
    import org.apache.iceberg.catalog.Table;
    import org.apache.iceberg.catalog.CatalogProperties;

import java.util.HashMap; import java.util.Map;

public class IcebergSnowflakeExample { public static void main(String[] args) { Map<String, String> properties = new HashMap<>(); properties.put(CatalogProperties.URI, "jdbc:snowflake://.snowflakecomputing.com"); properties.put(CatalogProperties.WAREHOUSE, ""); properties.put(CatalogProperties.DATABASE, ""); properties.put(CatalogProperties.SCHEMA, ""); properties.put(CatalogProperties.USER, ""); properties.put(CatalogProperties.PASSWORD, "");

    Catalog catalog = CatalogUtil.loadCatalog("snowflake", "my_snowflake_catalog", properties, null);

    // Create a namespace
    Namespace namespace = Namespace.of("my_namespace");
    catalog.createNamespace(namespace);

    // Create a table
    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, "my_table");
    Schema schema = new Schema(
        Types.NestedField.required(1, "id", Types.IntegerType.get()),
        Types.NestedField.required(2, "data", Types.StringType.get())
    );
    PartitionSpec spec = PartitionSpec.unpartitioned();
    Table table = catalog.createTable(tableIdentifier, schema, spec);

    // Use the table
    System.out.println("Table created: " + table);
}

}


4. Use the Catalog: Once the catalog is configured and initialized, you can use it to create, read, update, and delete tables in Snowflake using Iceberg APIs.

Summary
Add the necessary dependencies for Iceberg and Snowflake.
Configure the Snowflake catalog with the required properties.
Initialize the catalog in your application code.
Use the catalog to manage tables in Snowflake.
By following these steps, you can configure and use the Snowflake catalog in Apache Iceberg.
j3-signalroom commented 3 weeks ago

https://docs.snowflake.com/en/user-guide/tutorials/create-your-first-iceberg-table