Open devscheffer opened 3 months ago
@devscheffer Could you provide detailed information about how you install the helm chart? Is this service account spark-sa
created by helm or by yourself?
it is created by the helm.
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
labels:
app: spark-operator
name: spark-operator
namespace: spark-operator
spec:
chart:
spec:
chart: spark-operator
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: spark-operator
version: 1.4.0
interval: 5m0s
releaseName: spark-operator
values:
image:
repository: docker.io/kubeflow/spark-operator
pullPolicy: IfNotPresent
tag: ""
rbac:
create: false
createRole: true
createClusterRole: true
annotations: {}
serviceAccounts:
spark:
create: true
name: "spark-sa"
annotations: {}
sparkoperator:
create: true
name: "spark-operator-sa"
annotations: {}
sparkJobNamespaces:
- spark-operator
- team-1
webhook:
enable: true
port: 443
portName: webhook
namespaceSelector: ""
timeout: 30
metrics:
enable: true
port: 10254
portName: metrics
endpoint: /metrics
prefix: ""
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
effect: "NoSchedule"
It works when I do manually through the terminal however when I execute from airflow I get this error from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi2" is forbidden: User "system:serviceaccount:team-1:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "team-1"
here is the task in airflow
spark_kpo = KubernetesPodOperator(
task_id="kpo",
name="spark-app-submission",
namespace=namespace,
image="bitnami/kubectl:1.28.11",
cmds=["/bin/bash", "-c"],
arguments=[f"echo '{spark_app_manifest_content}' | kubectl apply -f -"],
in_cluster=True,
get_logs=True,
service_account_name=service_account_name,
on_finish_action="keep_pod",
)
@devscheffer The service account spark-sa
actually does not have any permissions for SparkApplication
, and it is used by spark driver pods. If you want to submit SparkApplication
in airflow, you can configure the service account name to spark-operator-sa
in KubernetesPodOperator
instead. Or you can create a ServiceAccount manually and grant it with all permissions to SparkApplication
.
Hello. I'd like to say, that I do have the same result. I deployed helm v2.0.2 like so:
helm install spark-operator ./spark-operator \
--version 2.0.2 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={,airflow}' \
--values ./values.yaml
With values.yaml for it was like:
nameOverride: ""
fullnameOverride: ""
commonLabels: {}
image:
registry: docker.io
repository: kubeflow/spark-operator
tag: ""
pullPolicy: IfNotPresent
pullSecrets: []
controller:
replicas: 1
workers: 10
logLevel: info
uiService:
enable: true
uiIngress:
enable: false
urlFormat: ""
batchScheduler:
enable: true
kubeSchedulerNames:
- volcano
default: ""
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
labels: {}
annotations: {}
volumes: []
nodeSelector: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- test-node
tolerations:
- key: "airflow"
operator: "Equal"
value: "true"
effect: "NoSchedule"
priorityClassName: ""
podSecurityContext: {}
topologySpreadConstraints: []
env: []
envFrom: []
volumeMounts: []
resources: {}
securityContext: {}
sidecars: []
podDisruptionBudget:
enable: false
minAvailable: 1
pprof:
enable: false
port: 6060
portName: pprof
workqueueRateLimiter:
bucketQPS: 50
bucketSize: 500
maxDelay:
enable: true
duration: 6h
webhook:
enable: true
replicas: 1
logLevel: info
port: 9443
portName: webhook
failurePolicy: Fail
timeoutSeconds: 10
resourceQuotaEnforcement:
enable: false
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
labels: {}
annotations: {}
sidecars: []
volumes: []
nodeSelector: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- test-node
tolerations:
- key: "airflow"
operator: "Equal"
value: "true"
effect: "NoSchedule"
priorityClassName: ""
podSecurityContext: {}
topologySpreadConstraints: []
env: []
envFrom: []
volumeMounts: []
resources: {}
securityContext: {}
podDisruptionBudget:
enable: false
minAvailable: 1
spark:
jobNamespaces:
- "airflow"
serviceAccount:
create: true
name: ""
annotations: {}
rbac:
create: true
annotations: {}
prometheus:
metrics:
enable: true
port: 8080
portName: metrics
endpoint: /metrics
prefix: ""
podMonitor:
create: true
labels: {}
jobLabel: spark-operator-podmonitor
podMetricsEndpoint:
scheme: http
interval: 5s
And right after that, if I run a DAG from Airflow, as a result I have a POD spark-submit which fails with the next error:
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '35324a3b-9f01-4c3b-bf56-445ea8746423', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '8bae74e0-9f4b-483f-8878-77b94fe77097', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b1662841-0cf0-4ed4-8ade-b34262bca683', 'Date': 'Fri, 18 Oct 2024 08:05:50 GMT', 'Content-Length': '483'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"sparkapplications.sparkoperator.k8s.io \"spark-submit-soyzhqvo\" is forbidden: User \"system:serviceaccount:transgran-spreads:airflow-worker\" cannot get resource \"sparkapplications/status\" in API group \"sparkoperator.k8s.io\" in the namespace \"airflow\"","reason":"Forbidden","details":{"name":"spark-submit-soyzhqvo","group":"sparkoperator.k8s.io","kind":"sparkapplications"},"code":403}
This can be fixed by adding:
to airflow-pod-launcher-role (Role)
apiGroups:
to spark-operator-spark (RoleBinding):
kind: ServiceAccount name: default namespace: airflow
With all this above, I'd like to ask why this fixes wasn't added by the helm chart ?
Description
I use the helmchart of spark operator, it is deployed at the namespace spark-operator I configure on the helmrelease sparkJobNamespaces: spark-jobs that is the namespace where I want to run the jobs. However, I'm getting this error
Name: "pyspark-pi", Namespace: "spark-jobs"
from server for: "STDIN": sparkapplications.sparkoperator.k8s.io "pyspark-pi" is forbidden: User "system:serviceaccount:spark-jobs:spark-sa" cannot get resource "sparkapplications" in API group "sparkoperator.k8s.io" in the namespace "spark-jobs"