kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.76k stars 1.37k forks source link

Question about custom parameter configuration #1311

Open littletiger123 opened 3 years ago

littletiger123 commented 3 years ago

Hi, I want to use custom parameter configuration. So in my sparkApplication.yaml

spec:
  sparkConf:
    "minPartitions": "4"

Also in the log of the spark operator controller, I found that the custom parameter is also in it.

/opt/spark/bin/spark-submit --class org.example.App --master k8s://http
s://172.21.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=default --conf spark.app.name=spark-pi --conf spark.kubernetes.driver
.pod.name=spark-pi-driver --conf spark.kubernetes.container.image=registry.cn-beijing.aliyuncs.com/chantest/wordcount:v0.19 --conf spark.kuberne
tes.container.image.pullPolicy=IfNotPresent --conf spark.kubernetes.submission.waitAppCompletion=false --conf minPartitions=4 --conf spark.kuber
netes.driver.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=t
rue --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=98677161-3835-4538-a732-3723e6ba7a9c --conf spark.driver.cores=1 --c
onf spark.kubernetes.driver.limit.cores=1200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spa
rk --conf spark.kubernetes.driver.label.version=3.1.1 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark
.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/subm
ission-id=98677161-3835-4538-a732-3723e6ba7a9c --conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m
 --conf spark.kubernetes.executor.label.version=3.1.1 local:///opt/spark/examples/jars/wordCount-1.0-SNAPSHOT.jar]

However, in the log of spark-driver, it encounted a exception

Exception in thread "main" java.util.NoSuchElementException: minPartitions
    at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:246)
    at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:246)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.SparkConf.get(SparkConf.scala:246)
    at org.example.App$.main(App.scala:19)
    at org.example.App.main(App.scala)

I want to know if the spark operator supports custom parameter configuration? And why the custom parameter cannot get in the main.scala .

jdonnelly-apixio commented 3 years ago

@littletiger123 sparkConf is for valid spark configuration options, e.g.

spec:
  ...
  sparkConf:
    spark.eventLog.dir: "s3a://my_bucket_name/eventLogFolder"
    spark.eventLog.enabled: "true"
    spark.sql.catalogImplementation: "hive"
    spark.hadoop.fs.s3a.connection.ssl.enabled: "true"
    spark.hadoop.fs.s3a.endpoint: https://s3.us-west-2.amazonaws.com
    spark.hadoop.fs.s3a.fast.upload: "true"
    spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
    spark.hadoop.fs.s3a.path.style.access: "true"
    spark.hadoop.hive.input.format: io.delta.hive.HiveInputFormat
    spark.hadoop.hive.metastore.client.connect.retry.delay: "5"
    spark.hadoop.hive.metastore.client.socket.timeout: "1800"
    spark.hadoop.hive.metastore.uris: {{params.hive_metastore_uri}}
    spark.hadoop.hive.tez.input.format: io.delta.hive.HiveInputFormat

If you want to pass custom args, you can use arguments:

spec:
  ...
  arguments:
  - "500000"
littletiger123 commented 3 years ago

@jdonnelly-apixio Hi, I found the explanation of the sparkConf in sparkApplcation.

// SparkConf carries user-specified Spark configuration properties as they would use the  "--conf" option in spark-submit.

As we know, in spark we can identify custom parameters after -- conf, so I try passing custom ags in spec.sparkConf. That is the reason I try spec.sparkConf.

If I want to try custom args in kv, what should I do ? In this solution, it may not do well in spec.arguments.

usr-av commented 2 years ago

i was able to pass kv as follows

  arguments:
    - "arg_1=1111"
    - "arg_2=2222"
github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.