Some Spark properties are related to deployment, typically set through configuration file or spark-submit command line options with "native" Spark.
These properties will not be applied if passed directly to .spec.sparkConf in the SparkApplication custom resource. Indeed, .spec.sparkConf is only intended for properties that affect Spark runtime control, like spark.task.maxFailures.
Example:
Setting spark.executor.instances in .spec.sparkConf will not affect the number of executors. Instead, we have to set the field .spec.executor.instances in the SparkApplication yaml file.
It would be nice if we could set/override such properties in .spec.sparkConf. Thus, we could easily "templatize" a SparkApplication and set runtime parameters with Spark semantics. In other terms, we should be able to move the cursor as we want between native Spark and Spark Operator semantics.
The concerned properties that I have identified so far:
spark.kubernetes.driver.request.cores
spark.kubernetes.executor.request.cores
spark.kubernetes.executor.deleteOnTermination
spark.driver.cores
spark.executor.cores
spark.executor.instances
spark.kubernetes.container.image
spark.kubernetes.driver.container.image
spark.kubernetes.executor.container.image
spark.kubernetes.container.image.pullPolicy
I do not know if spark.submit.pyFiles and spark.jars are also concerned. If they are, it is a problem as these properties are multi-valued: .spec.deps.pyFiles must be an array of strings, while the Spark property is only a string containing comma-separated Python dependencies, and in this case it is not easy to switch from the Spark semantics to the Spark Operator logic...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Some Spark properties are related to deployment, typically set through configuration file or
spark-submit
command line options with "native" Spark. These properties will not be applied if passed directly to.spec.sparkConf
in theSparkApplication
custom resource. Indeed,.spec.sparkConf
is only intended for properties that affect Spark runtime control, likespark.task.maxFailures
.Example: Setting
spark.executor.instances
in.spec.sparkConf
will not affect the number of executors. Instead, we have to set the field.spec.executor.instances
in theSparkApplication
yaml file.It would be nice if we could set/override such properties in
.spec.sparkConf
. Thus, we could easily "templatize" a SparkApplication and set runtime parameters with Spark semantics. In other terms, we should be able to move the cursor as we want between native Spark and Spark Operator semantics.The concerned properties that I have identified so far:
I do not know if
spark.submit.pyFiles
andspark.jars
are also concerned. If they are, it is a problem as these properties are multi-valued:.spec.deps.pyFiles
must be an array of strings, while the Spark property is only a string containing comma-separated Python dependencies, and in this case it is not easy to switch from the Spark semantics to the Spark Operator logic...