stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform
https://stackable.tech
Other
47 stars 2 forks source link

envOverrides have no effect #362

Open sbernauer opened 4 months ago

sbernauer commented 4 months ago

Affected Stackable version

0.0.0-dev

Affected Apache Spark-on-Kubernetes version

3.5.0

Current and expected behavior

Setting envOverrides has no effect. I currently need to resort to podOverrides

Possible solution

Set them :)

Additional context

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: access-hdfs
spec:
  sparkImage:
    productVersion: 3.5.0
  mode: cluster
  mainApplicationFile: local:///stackable/spark/jobs/access-hdfs.py
  deps:
    packages:
      - org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3
  sparkConf:
    spark.driver.extraClassPath: /stackable/config/hdfs
    spark.executor.extraClassPath: /stackable/config/hdfs
    spark.hadoop.hive.metastore.kerberos.principal: hive/hive-iceberg.default.svc.cluster.local@KNAB.COM
    spark.hadoop.hive.metastore.sasl.enabled: "true"
    spark.kerberos.keytab: /stackable/kerberos/keytab
    spark.kerberos.principal: spark/spark.default.svc.cluster.local@KNAB.COM
    spark.sql.catalog.lakehouse: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.lakehouse.type: hive
    spark.sql.catalog.lakehouse.uri: thrift://hive-iceberg:9083
    spark.sql.defaultCatalog: lakehouse
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  job:
    config:
      volumeMounts: &volumeMounts
        - name: script
          mountPath: /stackable/spark/jobs
        - name: hdfs-config
          mountPath: /stackable/config/hdfs
        - name: kerberos
          mountPath: /stackable/kerberos
        # Yes, I'm too lazy to fiddle around with JVM arguments... (-Djava.security.krb5.conf=/example/path/krb5.conf)
        - name: kerberos
          mountPath: /etc/krb5.conf
          subPath: krb5.conf
    envOverrides: &envOverrides
      KERBEROS_REALM: KNAB.COM
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark-submit
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  driver:
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 100m
    envOverrides: *envOverrides
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  executor:
    replicas: 1
    config:
      volumeMounts: *volumeMounts
      resources: # I would like to run this stack on my Laptop
        cpu:
          min: 250m
    envOverrides: *envOverrides
    # As the envOverrides are not working
    podOverrides:
      spec:
        containers:
          - name: spark
            env:
              - name: KERBEROS_REALM
                value: KNAB.COM
  volumes:
    - name: script
      configMap:
        name: access-hdfs-script
    - name: hdfs-config
      configMap:
        name: hdfs
    - name: kerberos
      ephemeral:
        volumeClaimTemplate:
          metadata:
            annotations:
              secrets.stackable.tech/class: kerberos
              secrets.stackable.tech/kerberos.service.names: spark
              secrets.stackable.tech/scope: service=spark
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "1"
            storageClassName: secrets.stackable.tech
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: access-hdfs-script
data:
  access-hdfs.py: |
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, LongType, ShortType, FloatType, DoubleType, BooleanType, TimestampType, MapType, ArrayType
    from pyspark.sql.functions import col, from_json, expr

    spark = SparkSession.builder.appName("access-hdfs").getOrCreate()

    spark.sql("show catalogs").show()
    spark.sql("show tables in lakehouse.default").show()

    spark.sql("SELECT * FROM lakehouse.customer_analytics.customers").show()

Environment

No response

Would you like to work on fixing this bug?

None