puckel / docker-airflow

Docker Apache Airflow
Apache License 2.0
3.78k stars 543 forks source link

Scheduler throw API exception (403) #473

Open Am1rr3zA opened 4 years ago

Am1rr3zA commented 4 years ago

I am trying to use airflow-kube-helm to deploy airflow on my Kubernetes cluster and take advantage of KubeExcutor to run my dags.

From UI perspective everything looks fine but I noticed my DAG actually wouldn't run, so I checked scheduler logs and noticed this:

kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': '8e20cdec-9be7-42bb-a162-ee5a80fb8f76', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 25 Nov 2019 22:48:35 GMT', 'Content-Length': '283'}) HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow:default\" cannot watch resource \"pods\" in API group \"\" in the namespace \"airflow\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}\n'

Process KubernetesJobWatcher-3476:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 267, in run
    self.worker_uuid)
  File "/usr/local/lib/python3.6/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 288, in _run
    **kwargs):
  File "/usr/local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 142, in stream
    resp = func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12372, in list_namespaced_pod
    (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 12472, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 334, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 355, in request
    headers=headers)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden

I have searched a little bit and it seems it's related to RBAC but I don't have any RBAC set and I clearly have it diabled in my helm chart

 rbac:
    enabled: false

anyone knows what's the solution?

ravsa commented 4 years ago

@Am1rr3zA did you find any fix for this?

sonac commented 4 years ago

this error isn't related to rbac option per se, you've got smth wrong with you roles/cluster role/role binding in the cluster. text after this should give you more context on what exactly is going on.

bhavaniravi commented 4 years ago

I have the exact same issue as above. I assumed it was because "system:serviceaccount:airflow:default\" i.e., default namespace trying to access airflow namespace. But I don't have any default in my whole yaml files.

Would be great if someone can suggest a fix for this

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: admin-rbac
  namespace: airflow
subjects:
  - kind: ServiceAccount
    # Reference to upper's `metadata.name`
    name: airflow
    # Reference to upper's `metadata.namespace`
    namespace: airflow
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
lfreina commented 3 years ago

This is an old thread but I had a similar issue and I found that the namespace on the service account was not being defined.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow 
  namespace: airflow

Make sure the namespace is given while creating the service account. It sounds like @bhavaniravi said that if you don't define the namespace it will take "default" as a namespace.

NickYadance commented 2 years ago
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
    _run_scheduler_job(args=args)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
    job.run()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 246, in run
    self._execute()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
    self._run_scheduler_loop()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 704, in _run_scheduler_loop
    self.adopt_or_reset_orphaned_tasks()
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1140, in adopt_or_reset_orphaned_tasks
    for attempt in run_with_db_retries(logger=self.log):
  File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 382, in __iter__
    do = self.iter(retry_state=retry_state)
  File "/home/airflow/.local/lib/python3.7/site-packages/tenacity/__init__.py", line 349, in iter
    return fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1185, in adopt_or_reset_orphaned_tasks
    to_reset = self.executor.try_adopt_task_instances(tis_to_reset_or_adopt)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 682, in try_adopt_task_instances
    self._adopt_completed_pods(kube_client)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 727, in _adopt_completed_pods
    pod_list = kube_client.list_namespaced_pod(namespace=self.kube_config.kube_namespace, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod
    (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 12905, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request
    headers=headers)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET
    query_params=query_params)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ca1e0949-4c3d-48bb-baaa-68cd5f1f2f9e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'fda3fbea-c809-473a-97be-2bd4573d4ea8', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'c5a40a51-d53a-4b67-8f44-beaf771d0c56', 'Date': 'Fri, 25 Mar 2022 03:12:27 GMT', 'Content-Length': '292'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow:airflow-scheduler\" cannot list resource \"pods\" in API group \"\" in the namespace \"airflow\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Same issue here on a fresh helm managed airflow install.

NickYadance commented 2 years ago

Digging out for a while finally getting it worked. The issue is that i set executor param by editing airflow.cfg directly but not through helm values.yaml. This will cause helm not setting up the k8s rolebinding correctly. The correct values.yaml simply should be like this:

## config:
##   core:
##     executor: 'KubernetesExecutor' ## This will cause auth issue.
executor: 'KubernetesExecutor'

Detail root cause can be found here .//templates/rbac/pod-launcher-rolebinding.yaml:

{{- if and .Values.rbac.create .Values.allowPodLaunching }}
{{- $schedulerLaunchExecutors := list "LocalExecutor" "KubernetesExecutor" "CeleryKubernetesExecutor" }}
{{- $workerLaunchExecutors := list "CeleryExecutor" "KubernetesExecutor" "CeleryKubernetesExecutor" }}
{{- if .Values.multiNamespaceMode }}
kind: ClusterRoleBinding
{{- else }}
kind: RoleBinding
{{- end }}
apiVersion: rbac.authorization.k8s.io/v1
metadata:
{{- if not .Values.multiNamespaceMode }}
  namespace: "{{ .Release.Namespace }}"
{{- end }}
  name: {{ .Release.Name }}-pod-launcher-rolebinding
  labels:
    tier: airflow
    release: {{ .Release.Name }}
    chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    heritage: {{ .Release.Service }}
{{- with .Values.labels }}
{{ toYaml . | indent 4 }}
{{- end }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
{{- if .Values.multiNamespaceMode }}
  kind: ClusterRole
{{- else }}
  kind: Role
{{- end }}
  name: {{ .Release.Name }}-pod-launcher-role
subjects:
{{- if has .Values.executor $schedulerLaunchExecutors }}
  - kind: ServiceAccount
    name: {{ include "scheduler.serviceAccountName" . }}
    namespace: "{{ .Release.Namespace }}"
{{- end }}
{{- if has .Values.executor $workerLaunchExecutors }}   ## Root cause here.
  - kind: ServiceAccount
    name: {{ include "worker.serviceAccountName" . }}
    namespace: "{{ .Release.Namespace }}"
{{- end }}
{{- end }}
IAlexEgorov commented 2 years ago

I have the same issue I tried to solve this by adding rights from default role, when find solution `