kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.38k forks source link

node affinity doesnt work #1519

Closed ghost closed 1 month ago

ghost commented 2 years ago

This functional requirement is very important. For example, if there are CPU machines and GPU machines in the node, node scheduling must be required!!

企业微信截图_20220427134954

oscar-dela commented 2 years ago

In my previous experience, that may be caused by mutating-admission-webhook missing.

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#about-the-mutating-admission-webhook

josecsotomorales commented 2 years ago

I use nodeSelector and it works for me, in case you want to give it a try.

elihschiff commented 2 years ago

I had the same issue and was able to get it working using a pod template file without the need to enable the webhook https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1176#issuecomment-1179287656

ghost commented 2 years ago

I had the same issue and was able to get it working using a pod template file without the need to enable the webhook #1176 (comment)

can you give a example of the pod template ?

elihschiff commented 2 years ago

My pod template has a lot of custom settings for my infrastructure. However, any valid pod yaml file should work. The spark/k8s docs have more details https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/pod-templates.html

archongum commented 1 year ago

--conf spark.kubernetes.node.selector.kubernetes.io/hostname=sz-exa-cpu-10 works for me.

Ref: https://spark.apache.org/docs/latest/running-on-kubernetes.html

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.