kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.76k stars 1.37k forks source link

Support of dynamic resource allocation #641

Open yuanchen8911 opened 5 years ago

yuanchen8911 commented 5 years ago

This is an important feature. It seems the current implementation simply kills the existing application and restarts new one. Any update or plan on improving it, e.g, incremental allocation? Thanks.

Quote from the user guide.

"If the application is currently running, the operator kills the running application before submitting a new run with the updated specification. There is planned work to enhance the way SparkApplication updates are handled. For example, if the change was to increase the number of executor instances, instead of killing the currently running application and starting a new run, it is a much better user experience to incrementally launch the additional executor pods."

liyinan926 commented 5 years ago

First of all, to clarify a bit on this, dynamic resource allocation is a Spark built-in feature to dynamically change the number of executors at runtime according to the task workload. I think here you are referring to the way the operator handles changes to the number of executors in a SparkApplication object. It's ideal to not need an application restart in that case. However. currently we are not able to achieve that because the executor pods are requested and managed by the driver, which serves as a custom k8s controller for the executor pods. Unless we have a way to plug into the driver and gain control over executor pod provisioning and management, there's really no way we can achieve that. The user guide is too forward looking on this.

Regarding dynamic resource allocation, we do plan to support that. With dynamic resource allocation, spec.executor.instances in a SparkApplication object is the initial number of executor pods to request. The actual number may change dynamically at runtime.

yuanchen8911 commented 5 years ago

Yes, I'm aware of the Spark build-in support and my questions was if we could dynamically update the number of executors through the Spark operator (i.e., by updating the number of instances in a SparkApplication object). To avoid a restart, can we use Webhook to intercept and handle the Spark driver's scheduling requests, a similar mechanism that Volcano uses to integrate its batch scheduler into the operator?

By saying "Regarding dynamic resource allocation, we do plan to support that." , did you mean you would build a similar dynamic allocation logic as the Spark driver into the operator (i.e., automated and dynamic adjustment of the pod number)? I think the problem stays he same. Can we do it without restarting the job or running pods?

liyinan926 commented 5 years ago

Yes, I'm aware of the Spark build-in support and my questions was if we could dynamically update the number of executors through the Spark operator (i.e., by updating the number of instances in a SparkApplication object). To avoid a restart, can we use Webhook to intercept and handle the Spark driver's scheduling requests, a similar mechanism that Volcano uses to integrate its batch scheduler into the operator?

We can use the webhook to configure newly created executor pods if we can inform the driver the updated number of executors at runtime without restarting the driver. For example, ideally spark-submit should have a subcommand for updating certain properties of the job without restarting it. I am not aware of a way to do that. Do you know if there's any way to do that?

By saying "Regarding dynamic resource allocation, we do plan to support that." , did you mean you would build a similar dynamic allocation logic as the Spark driver into the operator (i.e., automated and dynamic adjustment of the pod number)? I think the problem stays he same. Can we do it without restarting the job or running pods?

We do have a plan to build a Kubernetes-native external shuffle service that works with the new shuffle API currently being worked on. However, what I meant was more for the operator be able to automatically configure an application to use an existing shuffle service. With dynamic resource allocation, the driver decides the number of active executors at runtime and add new or remove existing executors based on that without needing to restart.

k82cn commented 4 years ago

We do have a plan to build a Kubernetes-native external shuffle service that works with the new shuffle API currently being worked on.

Any detail on that? "Dynamic resource allocation" is also interesting to me :)

yuanchen8911 commented 4 years ago

@k82cn Do you guys have any plan to support dynamic number of pods for a batch job on the Volcano side?

k82cn commented 4 years ago

Do you guys have any plan to support dynamic number of pods for a batch job on the Volcano side?

oh, we have the plan for both spark, operator and volcano side :)

liyinan926 commented 4 years ago

Any detail on that? "Dynamic resource allocation" is also interesting to me :)

It looks like that the new shuffle API change necessary for implementing the external shuffle service won't be completed in Spark 3.0, making it impossible to build a pluggable Kubernetes-native shuffle service even with Spark 3.0. One alternative is discussed in https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/616.

liyinan926 commented 4 years ago

A limited form of dynamic allocation via shuffle tracking will be supported in https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/976.

alculquicondor commented 3 years ago

Spark 3 (or maybe 3.1) supports dynamic allocation now (without an external shuffler).

However, the current defaulting sets the number of instances to 1

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/6f66e3f7e851024fc767f4bbe6ba6796c0b468cd/pkg/apis/sparkoperator.k8s.io/v1beta2/defaults.go#L73

Doesn't that disable the effects of dynamic allocation?