kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.77k stars 1.38k forks source link

Writing SparkApplication yaml file for a Spring Boot web application #1679

Closed wschung1113 closed 1 month ago

wschung1113 commented 1 year ago

Hi all,

I've been struggling for the last couple of days trying to deploy a SparkApplication for our Spring Boot project. I read through the official user-guide (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#mounting-configmaps) and threads on "Issues" but am still confused about some things.

` apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-test namespace: sparkapp spec: type: Java mode: cluster

image: "gcr.io/spark-operator/spark:v3.1.1"

image: bi33-repo:31313/visualization:spark imagePullPolicy: Always mainClass: com.tmax.hyperdata.visualanalytics.VisualanalyticsApplication

mainApplicationFile: "local:home/catdragon/VisualanalyticsApplication/visualanalytics-0.0.1-SNAPSHOT.jar"

mainApplicationFile: "local:///opt/spark/jars/visualanalytics-0.0.1-SNAPSHOT.jar"

sparkVersion: "3.1.1" restartPolicy: type: Never volumes:

`

To start off with, here are my SparkApplication specs. Things such as spark-operator or service account has been set up. I have enabled webhook in order to overwrite some environment variables for our Spring Boot app.

  1. Do I enter the image of our app or gcr.io/spark-operator/spark:v3.1.1?
    • There are not a lot of examples on the web other than the pi_spark tutorial so I couldn't figure which image to specify.
  2. If I must specify the location of .jar file of our app along our application image, which I think should be redundant, is the location fixed? Where should it be? Inside our image?

Above are my two main concerns. I have managed to figure out how to overwrite env variables by enabling webhook creating the spark-operator and creating a Secret file instead of a ConfigMap for environment variables.

If anybody can help me out with my concerns or guide me to an appropriate link that I can study it'd be awesome. If there is a similar issue posted please let me know and I will put down this one. Also please let me know if I can provide any additional information solving my concerns.

Thank you!

OneCricketeer commented 1 year ago

Note: Spark does not run "web apps", and Spring Boot would only be used for configuration, so no SpringMVC / Webflux libraries should be included... In other words, there are better configuration-only libraries you could probably use instead of bringing in Spring Framework.

1 & 2 - You need to build a new image with your JAR. You can use FROM gcr.io/spark-operator/spark:v3.1.1 as a recommended base... The file is local to the container, yes, so will need a COPY Dockerfile statement (Although spark-submit can accept files over http and hdfs protocols, I believe). The file path does not need to be /opt/spark/jars; in fact, best not to place files within Spark's classpath.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.