stwind / airflow-on-kubernetes

Bare minimal Airflow on Kubernetes (Local, EKS, AKS)
52 stars 9 forks source link

KubernetesJobWatcher. Failing #6

Closed anayyar82 closed 4 years ago

anayyar82 commented 4 years ago

While running the Container and server, getting below issue, any idea how to resolve it.

[2020-04-01 01:59:41,123] {kubernetes_executor.py:447} ERROR - Error while health checking kube watcher process. Process died for unknown reasons [2020-04-01 01:59:41,141] {kubernetes_executor.py:351} INFO - Event: and now my watch begins starting at resource_version: 0 [2020-04-01 01:59:41,161] {kubernetes_executor.py:342} ERROR - Unknown error in KubernetesJobWatcher. Failing Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 340, in run self.worker_uuid, self.kube_config) File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 364, in _run **kwargs): File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 142, in stream resp = func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501 File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 12905, in list_namespaced_pod_with_http_info collection_formats=collection_formats) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api _preload_content, _request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request headers=headers) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET query_params=query_params) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 01 Apr 2020 01:59:41 GMT', 'Content-Length': '283'}) HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \\"system:serviceaccount:default:default\\" cannot watch resource \\"pods\\" in API group \\"\\" in the namespace \\"default\\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}\n'

stwind commented 4 years ago

Hi @anayyar82, the error message is "pods is forbidden: User \\"system:serviceaccount:default:default\\" cannot watch resource \\"pods\\" in API group \\"\\" in the namespace \\"default\\"", so I think you could just give this account the watch permission?

anayyar82 commented 4 years ago

also in the code "volumes": [{"name":"store","hostPath":{"path":"'$PWD/dags'","type":"Directory"}}]

i am getting below error

Warning FailedMount 7s (x5 over 15s) kubelet, minikube MountVolume.SetUp failed for volume "store" : hostPath type check failed: /Users/ankur.nayyar/Documents/airflow-aks-prod/dags is not a directory

any idea how to resolve it. i am using minikube and kubectl in Mac machine.

stwind commented 4 years ago

It seems like the /Users/ankur.nayyar/Documents/airflow-aks-prod/dags folder is not created yet?

anayyar82 commented 4 years ago

Thanks seems if i update below then it works.

"volumes": [{"name":"store","hostPath":{"path":"'$PWD/dags'"}}]

Is it fine to remove Directory ?

stwind commented 4 years ago

Yeah you can use any hostPath types that works for your environment.

anayyar82 commented 4 years ago

Thanks a lot. Now seems everything is working the last error i am getting is

[2020-04-02 15:47:58,990] {settings.py:253} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=29 [2020-04-02 15:48:23,369] {scheduler_job.py:1382} ERROR - Exception when executing execute_helper Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1380, in _execute self._execute_helper() File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1441, in _execute_helper if not self._validate_and_run_task_instances(simple_dag_bag=simple_dag_bag): File "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1503, in _validate_and_run_task_instances self.executor.heartbeat() File "/usr/local/lib/python3.7/site-packages/airflow/executors/base_executor.py", line 134, in heartbeat self.sync() File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 843, in sync results = self.result_queue.get_nowait() File "<string>", line 2, in get_nowait File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 818, in _callmethod conn.send((self._id, methodname, args, kwds)) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe [2020-04-02 15:48:23,678] {helpers.py:322} INFO - Sending Signals.SIGTERM to GPID 29 [2020-04-02 15:48:24,135] {helpers.py:288} INFO - Process psutil.Process(pid=281, status='terminated') (281) terminated with exit code None [2020-04-02 15:48:24,140] {helpers.py:288} INFO - Process psutil.Process(pid=280, status='terminated') (280) terminated with exit code None [2020-04-02 15:48:24,145] {helpers.py:288} INFO - Process psutil.Process(pid=29, status='terminated') (29) terminated with exit code 0 [2020-04-02 15:48:24,148] {scheduler_job.py:1385} INFO - Exited execute loop