kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.75k stars 1.37k forks source link

Most of the properties don't work - What could be the reason? #1380

Open roligupt opened 2 years ago

roligupt commented 2 years ago

I have been trying to test and most of the properties don't work for me.

Below are the properties that I tested and don't work for me - hadoopConfigMap:

driver/executer: affinity gpu tolerations: env:

I am using latest spark operator image - spark-operator:v1beta2-1.2.3-3.1.1 and have webhook enabled and service running.

I am still able to define all the configuration under sparkConf: section and all of them work the way expected

sparkConf: "spark.kubernetes.kerberos.krb5.configMapName" : krb5.conf "spark.kubernetes.hadoop.configMapName" : hadoop-gdap-np-mil "spark.driver.resource.gpu.vendor" : nvidia.com "spark.executor.resource.gpu.vendor" : nvidia.com "spark.driver.resource.gpu.discoveryScript" : /opt/oss/spark/examples/src/main/scripts/getGpusResources.sh "spark.executor.resource.gpu.discoveryScript" : /opt/oss/spark/examples/src/main/scripts/getGpusResources.sh "spark.driver.resource.gpu.amount" : "1" "spark.executor.resource.gpu.amount" : "1" "spark.task.resource.gpu.amount" : "1"

They dont work when defined like this -

driver: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms:

jalkjaer commented 2 years ago

Sounds like the webhook is not working in your cluster See: https://googlecloudplatform.github.io/spark-on-k8s-operator/docs/quick-start-guide.html#about-the-mutating-admission-webhook

roligupt commented 2 years ago

Sounds like the webhook is not working in your cluster See: https://googlecloudplatform.github.io/spark-on-k8s-operator/docs/quick-start-guide.html#about-the-mutating-admission-webhook

Yes that's what I figured with my testing yesterday, none of the functionalities that are dependent on webhook are working. The webhook service is running. spark-operator logs show these messages as well.

I1108 20:28:11.016882 9 main.go:144] Starting the Spark Operator I1108 20:28:11.018405 9 webhook.go:218] Starting the Spark admission webhook server W1108 20:28:11.114090 9 warnings.go:67] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration I1108 20:28:11.115066 9 webhook.go:412] Creating a MutatingWebhookConfiguration for the Spark pod admission webhook W1108 20:28:11.123325 9 warnings.go:67] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration I1108 20:28:11.210172 9 main.go:219] Starting application controller goroutines

I dont understand why it is not working. It doesn't show any errors anywhere.

I am looking into alternative ways of achieving the functionalities that are dependent on webhook.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.