ukaea / piezo

1 stars 0 forks source link

Feature/add tolerations to executor pods #137 #143

Open oliver-tarrant-tessella opened 5 years ago

oliver-tarrant-tessella commented 5 years ago

Add tolerations to the executor pods created through Piezo. These will then be allowed to run on any node with matching taints while other pods will not. By keeping a selection of nodes just for executors this should ensure that the system doesn't hang waiting for resources to run spark jobs. Note before the tolerations will have any affect a taint must be applied to the nodes that are to be designated as executor nodes. This taint is applied by running: kubectl taint {node name} piezoRestriction=executors:NoSchedule (other taints could be used but you would also need to change the default values in the validation rules to match.)

137

THIS BRANCH HAS NOT HAD SYSTEM TESTS RUN AGAINST IT AND SHOULD NOT BE MERGED UNTIL THEY HAVE SUCCESSFULLY RUN WITH IT

The wiki also needs updating with adding the requirement of setting up taints on a selection of nodes for executors in the setting up a kubernetes cluster page. This should not be done until the PR is merged as until this is merged no pods will have tolerations allowing them to run on the tainted nodes which will thus remain unused

oliver-tarrant-tessella commented 5 years ago

Before merging add the following paragraph to the wiki on setting up a kubernetes cluster between Add a priority class and Configure for spark applications:

Taint nodes

We will taint some nodes on the Kubernetes cluster to reserve them just for spark executor pods. This is to ensure that there will always be room on the cluster for spark jobs to run and the whole system won't get jammed. To see what nodes are part of your cluster run:

kubectl get nodes

On node will have the role master this node must not be altered as it is where the core systems for Kubernetes run. Out of the other nodes choose which ones you want to designate just to executors and note these down. Make sure you leave enough nodes untainted to run all the Piezo app essential services and spark driver pods. To apply the taints to the nodes that you have reserved for executors, for each node run:

kubectl taint {node name} piezoRestriction=executors:NoSchedule

Once applied the node will only allow the tolerated executor pods to schedule onto them. Executor pods will be tolerated by default by the Piezo web app.