kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.75k stars 1.37k forks source link

Create a REST API Server to submit Spark application #1455

Open hiboyang opened 2 years ago

hiboyang commented 2 years ago

We want to have a REST server to submit Spark application. It will not be much work adding a REST server inside Spark operator to accept request for people to submit Spark application. How do people like the idea?

dcharbonnier commented 2 years ago

Can you develop the use case? For me applications should only deployed with manifest. I don't see a good practice using a rest api to deploy an application but I may be wrong

nrchakradhar commented 2 years ago

K8S API server itself is a webserver. Manifest can be POSTed to the server. Any operator for that matter indirectly is a path handler.

hiboyang commented 2 years ago

The use case is providing Spark as a service, something like GCP DataProc (https://cloud.google.com/dataproc). People could deploy such a service inside their own Kubernetes environment. Then Spark users could use curl (or some other client tool) to submit Spark application to the service. Posting Manifest to K8S API server is an option, the downside is it is too complicated for Spark users, who should not need to learn all those K8S details.

nrchakradhar commented 2 years ago

Posting Manifest to K8S API server is an option, the downside is it is too complicated for Spark users, who should not need to learn all those K8S details. The K8S details can be wrapped in a small go, python or shell code. A web-server initially may look simpler, but with security, high-availability, recovering from either web-server or k8s spark operator failure, features it will be quite complex. Managing RBAC for submit, delete and translating it into K8S RBAC also will not be easy. Using client-side wrapper is far easier is my personal opinion. FYI - Apache Livy which partly had similar goals in the earlier Big data world did not gain much traction and continues to be in incubation.

dcharbonnier commented 2 years ago

This could also be a different project, if you want to provide spark as a service, you can write a service that will create and manage manifests in your k8s infrastructure if you choose to use k8s for the infrastructure of your saas. Probably easier to do it this way, I don't think the operator and this api have mush to share.

hiboyang commented 2 years ago

Yes, another option is to create the REST service as a different project. Let's see how it goes. Thanks folks for the feedback!

jdonnelly-apixio commented 2 years ago

We use airflow and that has an api. It has a spark on k8s airflow operator which works with this project. You define dags that have a task for submission and monitoring. The dags let you parameterize the spark app yaml. We also started using kubeflow, which has kubeflow pipelines and the spark on k8s operator. Kubeflow pipelines are very similar to airflow.

github-actions[bot] commented 23 hours ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.