Closed pgillet closed 1 month ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Some Spark properties are related to deployment, typically set through configuration file or
spark-submit
command line options with "native" Spark. These properties will not be applied if passed directly to.spec.sparkConf
in theSparkApplication
custom resource. Indeed,.spec.sparkConf
is only intended for properties that affect Spark runtime control, likespark.task.maxFailures
.Example: Setting
spark.executor.instances
in.spec.sparkConf
will not affect the number of executors. Instead, we have to set the field.spec.executor.instances
in theSparkApplication
yaml file.It would be nice if we could set/override such properties in
.spec.sparkConf
. Thus, we could easily "templatize" a SparkApplication and set runtime parameters with Spark semantics. In other terms, we should be able to move the cursor as we want between native Spark and Spark Operator semantics.The concerned properties that I have identified so far:
I do not know if
spark.submit.pyFiles
andspark.jars
are also concerned. If they are, it is a problem as these properties are multi-valued:.spec.deps.pyFiles
must be an array of strings, while the Spark property is only a string containing comma-separated Python dependencies, and in this case it is not easy to switch from the Spark semantics to the Spark Operator logic...