Closed jesumyip closed 1 month ago
@jesumyip Could you provide detailed information about how to install the helm chart?
You can try out the latest version if you'd like, as this new version has fixed many problems related to webhook.
Hi @ChenYi015
I have tried the latest version you provided.
spark:
jobNamespaces:
- ""
controller:
logLevel: "debug"
webhook:
logLevel: "debug"
spark: serviceAccount: create: true name: spark-sa
- Everything created with no errors. 2 pods are running - one for `spark-operator-controller` and one for `spark-operator-webhook`
- I then created a `SparkApplication`
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: test-hosts
namespace: xxx
spec:
type: Python
mode: cluster
image: "
And I waited about 1minute but still no pod created in the namespace xxx
. I checked the logs for the operator and webhook pods and nothing new - only the logs that were created when the 2 pods started up.'
>> kubectl get sparkapplication
NAME STATUS ATTEMPTS START FINISH AGE
test-hosts 9m52s
Is there some permissions that is incorrectly set? But I don't see any errors logged in the 2 pods in the spark-operator
namespace...
operator
pod logs
++ id -u
+ uid=0
++ id -g
+ gid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator controller start --zap-log-level=debug --namespaces=default --controller-threads=10 --enable-ui-service=true --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-controller-lock --leader-election-lock-namespace=spark-operator
Spark Operator Version: v2.0.0-rc.0+unknown
Build Date: 2024-08-12T02:57:44+00:00
Git Commit ID:
Git Tree State: clean
Go Version: go1.22.5
Compiler: gc
Platform: linux/amd64
2024-08-20T14:32:27.118Z INFO controller/start.go:251 Starting manager
2024-08-20T14:32:27.119Z INFO controller-runtime.metrics server/server.go:205 Starting metrics server
2024-08-20T14:32:27.119Z INFO manager/server.go:50 starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-08-20T14:32:27.119Z INFO controller-runtime.metrics server/server.go:244 Serving metrics server {"bindAddress": ":8080", "secure": false}
I0820 14:32:27.119306 10 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-controller-lock...
I0820 14:32:27.136595 10 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-controller-lock
2024-08-20T14:32:27.136Z DEBUG events recorder/recorder.go:104 spark-operator-controller-5f7497d6f5-9lxl4_ea1b7250-f6fd-42ec-9bbc-debb1a803c58 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-controller-lock","uid":"ef251560-cdef-4b4f-9080-ec9a4eecab1f","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5067755"}, "reason": "LeaderElection"}
2024-08-20T14:32:27.136Z INFO controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller", "source": "kind source: *v1.Pod"}
2024-08-20T14:32:27.136Z INFO controller/controller.go:178 Starting EventSource {"controller": "scheduled-spark-application-controller", "source": "kind source: *v1beta2.ScheduledSparkApplication"}
2024-08-20T14:32:27.136Z INFO controller/controller.go:178 Starting EventSource {"controller": "spark-application-controller", "source": "kind source: *v1beta2.SparkApplication"}
2024-08-20T14:32:27.136Z INFO controller/controller.go:186 Starting Controller {"controller": "scheduled-spark-application-controller"}
2024-08-20T14:32:27.136Z INFO controller/controller.go:186 Starting Controller {"controller": "spark-application-controller"}
2024-08-20T14:32:27.237Z INFO controller/controller.go:220 Starting workers {"controller": "spark-application-controller", "worker count": 10}
2024-08-20T14:32:27.237Z INFO controller/controller.go:220 Starting workers {"controller": "scheduled-spark-application-controller", "worker count": 10}
webhook pod
++ id -u
+ uid=0
++ id -g
+ gid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ [[ -z root:x:0:0:root:/root:/bin/bash ]]
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=spark-operator --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=spark-operator --webhook-port=9443 --mutating-webhook-name=spark-operator-webhook --validating-webhook-name=spark-operator-webhook --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=spark-operator
Spark Operator Version: v2.0.0-rc.0+unknown
Build Date: 2024-08-12T02:57:44+00:00
Git Commit ID:
Git Tree State: clean
Go Version: go1.22.5
Compiler: gc
Platform: linux/amd64
2024-08-20T14:32:27.297Z INFO webhook/start.go:243 Syncing webhook secret {"name": "spark-operator-webhook-certs", "namespace": "spark-operator"}
2024-08-20T14:32:27.772Z INFO webhook/start.go:257 Writing certificates {"path": "/etc/k8s-webhook-server/serving-certs", "certificate name": "tls.crt", "key name": "tls.key"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:189 Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:189 Registering a validating webhook {"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:158 Registering a mutating webhook {"GVK": "/v1, Kind=Pod", "path": "/mutate--v1-pod"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:183 Registering webhook {"path": "/mutate--v1-pod"}
2024-08-20T14:32:27.773Z INFO controller-runtime.builder builder/webhook.go:204 skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called {"GVK": "/v1, Kind=Pod"}
2024-08-20T14:32:27.773Z INFO webhook/start.go:319 Starting manager
2024-08-20T14:32:27.773Z INFO controller-runtime.metrics server/server.go:205 Starting metrics server
2024-08-20T14:32:27.773Z INFO manager/server.go:50 starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-08-20T14:32:27.773Z INFO controller-runtime.webhook webhook/server.go:191 Starting webhook server
2024-08-20T14:32:27.774Z INFO controller-runtime.metrics server/server.go:244 Serving metrics server {"bindAddress": ":8080", "secure": false}
2024-08-20T14:32:27.774Z INFO webhook/start.go:357 disabling http/2
2024-08-20T14:32:27.774Z DEBUG controller-runtime.healthz healthz/healthz.go:60 healthz check failed {"checker": "readyz", "error": "webhook server has not been started yet"}
2024-08-20T14:32:27.774Z INFO controller-runtime.healthz healthz/healthz.go:128 healthz check failed {"statuses": [{}]}
I0820 14:32:27.774433 10 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-webhook-lock...
2024-08-20T14:32:27.774Z INFO controller-runtime.certwatcher certwatcher/certwatcher.go:161 Updated current TLS certificate
2024-08-20T14:32:27.774Z INFO controller-runtime.webhook webhook/server.go:242 Serving webhook server {"host": "", "port": 9443}
2024-08-20T14:32:27.774Z INFO controller-runtime.certwatcher certwatcher/certwatcher.go:115 Starting certificate watcher
I0820 14:32:27.791240 10 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-webhook-lock
2024-08-20T14:32:27.791Z INFO controller/controller.go:178 Starting EventSource {"controller": "validating-webhook-configuration-controller", "source": "kind source: *v1.ValidatingWebhookConfiguration"}
2024-08-20T14:32:27.791Z INFO controller/controller.go:178 Starting EventSource {"controller": "mutating-webhook-configuration-controller", "source": "kind source: *v1.MutatingWebhookConfiguration"}
2024-08-20T14:32:27.791Z INFO controller/controller.go:186 Starting Controller {"controller": "validating-webhook-configuration-controller"}
2024-08-20T14:32:27.791Z INFO controller/controller.go:186 Starting Controller {"controller": "mutating-webhook-configuration-controller"}
2024-08-20T14:32:27.791Z DEBUG events recorder/recorder.go:104 spark-operator-webhook-75d88ff76d-549nw_aab28de5-4e4d-49ca-931c-c319031dbdba became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-webhook-lock","uid":"29e67682-4868-46a9-a954-592b2ad0d6cb","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5067773"}, "reason": "LeaderElection"}
2024-08-20T14:32:27.892Z INFO validatingwebhookconfiguration/event_handler.go:46 ValidatingWebhookConfiguration created {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.892Z INFO controller/controller.go:220 Starting workers {"controller": "validating-webhook-configuration-controller", "worker count": 1}
2024-08-20T14:32:27.892Z INFO controller/controller.go:220 Starting workers {"controller": "mutating-webhook-configuration-controller", "worker count": 1}
2024-08-20T14:32:27.892Z INFO mutatingwebhookconfiguration/event_handler.go:46 MutatingWebhookConfiguration created {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.897Z INFO mutatingwebhookconfiguration/controller.go:72 Updating CA bundle of MutatingWebhookConfiguration {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.897Z INFO validatingwebhookconfiguration/controller.go:73 Updating CA bundle of ValidatingWebhookConfiguration {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.907Z INFO mutatingwebhookconfiguration/event_handler.go:68 MutatingWebhookConfiguration updated {"name": "spark-operator-webhook", "namespace": ""}
2024-08-20T14:32:27.912Z INFO validatingwebhookconfiguration/event_handler.go:68 ValidatingWebhookConfiguration updated {"name": "spark-operator-webhook", "namespace": ""}
2024-08-20T14:32:27.917Z INFO mutatingwebhookconfiguration/controller.go:72 Updating CA bundle of MutatingWebhookConfiguration {"name": "spark-operator-webhook"}
2024-08-20T14:32:27.917Z INFO validatingwebhookconfiguration/controller.go:73 Updating CA bundle of ValidatingWebhookConfiguration {"name": "spark-operator-webhook"}
I also tried this values file where I modified the spark job namespaces
spark:
jobNamespaces:
- "xxx"
I notice in the webhook pod the startup parameter is still shown as
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default....
Is this the reason no SparkApplication
gets created because of --namespaces=default
?
I also tried this values file where I modified the spark job namespaces
spark: jobNamespaces: - "xxx"
I notice in the webhook pod the startup parameter is still shown as
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=debug --namespaces=default....
Is this the reason no
SparkApplication
gets created because of--namespaces=default
?
I have just tried to set spark.jobNamespacers
to [test]
:
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={test}'
and the webhook pods logs shown that namespaces were correctly set:
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=info --namespaces=test --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=spark-operator --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=spark-operator --webhook-port=9443 --mutating-webhook-name=spark-operator-webhook --validating-webhook-name=spark-operator-webhook --enable-metrics=true --metrics-bind-address=:8080 --metrics-endpoint=/metrics --metrics-prefix= --metrics-labels=app_type --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=spark-operator
spark: jobNamespaces:
- "" controller: logLevel: "debug" webhook: logLevel: "debug"
spark: serviceAccount: create: true name: spark-sa
@jesumyip There is an issue related to cache settings when setting spark.jobNamespaces
to all namespaces(""
), and this will be fixed in PR #2123 and #2128. So you need to set job namespaces to specific namespaces instead of [""]
.
looks like the helm chart isn't compatible with kustomize. i used kustomize to install it and the namespace for the webhook
isn't picked up correctly. it still gets shown as --namespaces=default
.
kustomize build . --enable-helm > output.yaml
shows this:
interestingly enough when i modify the helm chart @ line 54 of webhook/deployment.yaml
to become
{{- with .Values.duh.fish }}
- --namespaces={{ . | join "," }}
{{- end }}
and i set my values file to:
duh:
fish:
- "xxx"
- "test"
then the output is correct. i actually see
- --namespaces=xxx,test
The value of default
seems to be picked up from the included values.yaml
file in the helm chart. I cannot seem to override it with my own values file.
@ChenYi015 Now when I try installing it with
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={test,xxx}' \
--set 'spark.serviceAccount.name=spark-sa' \
--set 'spark.serviceAccount.create=true'
I can see the startup parameter for the webhook becomes --namespaces=test,xxx
which is expected.
But when I apply the SparkApplication
I can only see a svc
being created in namespace test
. There is no pod. There are also no additional logs in the controller
and webhook
pods. In the driver
pod logs, I can see this:
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/bladerunner/pods/xxx-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "xxx" is forbidden: User "system:serviceaccount:test:spark-sa" cannot get resource "pods" in API group "" in the namespace "test": RBAC: role.rbac.authorization.k8s.io "spark-sa" not found.
Now if I then reinstall the helm chart with
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={test,xxx}' \
and I have to change the service account in my SparkApplication
yaml to <helmchart-releasename>-spark
then the driver pod is created properly. I can also see that the driver pod has the envFrom
applied correctly.
@jesumyip Thanks for reporting the issue, the spark rolebinding template did not render properly when setting spark.serviceAccount.name
. I will fix it in the next release.
@ChenYi015 also have a look at that strangeness with spark.jobNamespaces
behaviour. I cannot seem to override the value provided in the default values.yaml
file.
@ChenYi015 Nevermind. I found out the problem with the spark.jobNamespaces
behaviour. It was my mistake. My values file had two spark:
sections.
/kind bug
I've tried
and when I run a
kubectl describe pod
on the driver, i don't see those env vars being picked up.mysecrets
is an opaque type secret.To test whether the spark operator webhook is working, I tried switching the YAML config to:
and that works just fine.
Am I doing this wrongly? I am using version 1.4.6 of the Helm Chart.