kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.73k stars 1.36k forks source link

Overriding log4j doesn't work for executors if javaOptions are provided #1038

Open suchitgupta01 opened 3 years ago

suchitgupta01 commented 3 years ago

Hi Team,

I was trying to override log4j for drivers and executors by creating a config map. The logging for drivers works fine but for executors it was taking a default configuration.

I created a config map and added sparkConf: spark.driver.extraJavaOptions: -Dlog4j.configuration=file:/etc/spark/conf/log4j.properties spark.executor.extraJavaOptions: -Dlog4j.configuration=file:/etc/spark/conf/log4j.properties

With the changes driver logs were as per custom log4j properties but executor logs were taking default log4j.

On investigating found that If I remove the .spec.template.executor.javaOptions then executors logs take custom log4j.

Is this expected behavior?

My yaml file:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
   name: test_batch_job`
   namespace: default
spec:
  schedule: "@every 2m"
  concurrencyPolicy: Forbid
  successfulRunHistoryLimit: 1
  failedRunHistoryLimit: 3
  template:
    type: Scala
    mode: cluster
    image: "test/batch:latest"
    imagePullPolicy: Never
    mainClass: com.explore.BatchJob
    mainApplicationFile: "local:///app/target/scala-2.12/explore-assembly-0.1.jar"
    sparkVersion: "3.0.0"
    restartPolicy:
      type: Never
    sparkConfigMap: test-spark-config
    sparkConf:
      spark.driver.extraJavaOptions: -Dlog4j.configuration=file:/etc/spark/conf/log4j.properties
      spark.executor.extraJavaOptions: -Dlog4j.configuration=file:/etc/spark/conf/log4j.properties
    driver:
      cores: 1
      coreLimit: "1200m"
      memory: "1024m"
      labels:
        version: 3.0.0
      serviceAccount: spark
     javaOptions: "-Dconfig.resource=localkube.conf"
    executor:
      cores: 2
      instances: 2
      memory: "1024m"
      labels:
        version: 3.0.0
     #Disable this and logging works fine
     javaOptions: "-Dconfig.resource=localkube.conf"

Spark: 3.0.0 Scala: 2.12

meetshah15 commented 1 year ago

@suchitgupta01 I am facing a problem with something you seem to have solved.

My log4j.properties file is present at /app/spark/conf/log4j.properties and I have added the following code in the sparkConf: new SparkConf().set("spark.executor.extraJavaOptions", "Dlog4j.configuration=/app/spark/conf/log4j.properties")

But it looks like spark is unable to locate it. What could be the problem here? Any idea?