stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform
https://stackable.tech
Other
47 stars 2 forks source link

Add Spark Iceberg runtime to the spark-k8s image #341

Open Jimvin opened 5 months ago

Jimvin commented 5 months ago

Spark Iceberg runtime is not included in the Spark distribution and it super useful for anyone interacting with Iceberg tables. It makes sense to me that this should be included in the Stackable spark-k8s Docker image since it is a commonly used component. Including it will allow for Spark SQL interaction with Iceberg tables out of the box.

At the moment I am loading the required library by adding this to my SparkApplication definition:

kind: SparkApplication
spec:
  deps:
    packages:
      - org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.3
sbernauer commented 5 months ago

The only problem we need to find a solution for is what Iceberg version we should include.