radanalyticsio / spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.
Apache License 2.0
157 stars 61 forks source link

Add a param to enable spawning spark cluster pods on specific Kubernetes nodes #273

Open kimcie opened 4 years ago

kimcie commented 4 years ago

Description: Currently it is not possible to deploy spark cluster on selected Kubernetes nodes. All spark pods are scheduled randomly.

It would be nice if the spark-operator had a feature to choose Kubernetes nodes where to schedule spark cluster. I imagine this would involve adding new param to spark cluster config and then having the operator map this value for example to nodeSelector of spark pod manifests.

jkremser commented 4 years ago

Hello Mikołaj, would you be interested in contributing this feature? It can be a nice first issue to nail as a new contributor and it shouldn't be super difficult as you've mentioned :) Adding the pod(Anti)Affinity field into spark cluster, probably for both master and worker, or just one global argument for both would be a good start. There is the json schema where it needs to be added and then Java classes are generated from those json schemas. Then the mapping from the java object to the fabric8 client "fluent" api.