stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform
https://stackable.tech
Other
52 stars 3 forks source link

Feature request: Passing Tolerations to executor pods #240

Open simonhampe opened 1 year ago

simonhampe commented 1 year ago

Affected version

23.4.0

Current and expected behavior

We are deploying Stackable on Azure with AKS using Helm/Terraform. We have successfully run SparkApplications on the default node pool. However, we would like to be able to deploy executors in a second node pool containing only Spot instances. In Azure, all Spot instance node pools automatically get the taint kubernetes.azure.com/scalesetpriority=spot:NoSchedule (even if we do not specify it in the Terraform file, this taint is apparently mandatory). Now, I can specify nodeAffinity to match the spot instances' labels, but I haven't found a way to pass tolerations. The helm chart for the Spark operator has a "tolerations" variable and I tried passing the right toleration there (as specified here), but it had no effect: The executors will not schedule, since their affinity does not match the default node pool and they have no toleration for the spot

Is there a way to pass tolerations in a SparkApplication that I have just overlooked? If not: I think this would be a fairly relevant feature for pod placement. Are there any plans to implement this?

razvan commented 1 year ago

Hey, thanks for your report. You are correct that it's currently not possible to define tolerations on Spark executor (or driver) pods.

The spot use-case sounds reasonable and we'll look into it.

lfrancke commented 1 year ago

As Razvan said: It's currently not possible but we track this over here https://github.com/stackabletech/issues/issues/385 and we have another customer who asked for this so we will try to get it into the next release.

razvan commented 1 year ago

Hey,

starting with the release 23.7 you can specify pod overrides for all SparkApplication pods.

Below is a simple example that demonstrates how to prevent Spark processes from running on "monitoring" nodes:

  job:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"
  driver:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"
  executor:
    podOverrides:
      spec:
        tolerations:
          - key: "monitor"
            value: "true"
            operator: "Equal"
            effect: "NoSchedule"

I hope this helps.