kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.76k stars 1.37k forks source link

gpu resource allocation, tolerations for gpu taints on the node and node affinity doesnt work #1381

Open roligupt opened 2 years ago

roligupt commented 2 years ago

Below is how I have configured for driver and executor in my test and it doesnt apply any of the configurations and my test fails because driver tries to schedule on a cpu and cant find required libraries to run the job.

I am using spark image built with cuda libraries that has all the required libraries for my test.

Can someone help me understand why any of the gpu configurations, node affinity and tolerations dont get applied?

driver: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

doctapp commented 2 years ago

UPDATE: Sorry my bad, affinity is working as expected on driver and executor specs. Same problem, can we have support for affinity on driver and executor specs?

doctapp commented 2 years ago

Sorry my bad affinity is working as expected.

On Wed, Feb 2, 2022 at 12:04 PM roligupt @.***> wrote:

oh so the affinity doesn't work either, I haven't tested that yet but I will be needing that.

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1381#issuecomment-1028153953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3OUNEBLN6P5KOAIMC2K2LUZFPX7ANCNFSM5HPIG4UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.*** com>

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.