Open simonhampe opened 1 year ago
Hey, thanks for your report. You are correct that it's currently not possible to define tolerations on Spark executor (or driver) pods.
The spot use-case sounds reasonable and we'll look into it.
As Razvan said: It's currently not possible but we track this over here https://github.com/stackabletech/issues/issues/385 and we have another customer who asked for this so we will try to get it into the next release.
Hey,
starting with the release 23.7 you can specify pod overrides for all SparkApplication
pods.
Below is a simple example that demonstrates how to prevent Spark processes from running on "monitoring" nodes:
job:
podOverrides:
spec:
tolerations:
- key: "monitor"
value: "true"
operator: "Equal"
effect: "NoSchedule"
driver:
podOverrides:
spec:
tolerations:
- key: "monitor"
value: "true"
operator: "Equal"
effect: "NoSchedule"
executor:
podOverrides:
spec:
tolerations:
- key: "monitor"
value: "true"
operator: "Equal"
effect: "NoSchedule"
I hope this helps.
Affected version
23.4.0
Current and expected behavior
We are deploying Stackable on Azure with AKS using Helm/Terraform. We have successfully run SparkApplications on the default node pool. However, we would like to be able to deploy executors in a second node pool containing only Spot instances. In Azure, all Spot instance node pools automatically get the taint
kubernetes.azure.com/scalesetpriority=spot:NoSchedule
(even if we do not specify it in the Terraform file, this taint is apparently mandatory). Now, I can specify nodeAffinity to match the spot instances' labels, but I haven't found a way to pass tolerations. The helm chart for the Spark operator has a "tolerations" variable and I tried passing the right toleration there (as specified here), but it had no effect: The executors will not schedule, since their affinity does not match the default node pool and they have no toleration for the spotIs there a way to pass tolerations in a SparkApplication that I have just overlooked? If not: I think this would be a fairly relevant feature for pod placement. Are there any plans to implement this?