villasv / aws-airflow-stack

Turbine: the bare metals that gets you Airflow
https://victor.villas/aws-airflow-stack/
MIT License
377 stars 69 forks source link

airflow missing logs SIGTERM error #194

Open adib-next opened 3 years ago

adib-next commented 3 years ago

Has anyone come across this issue where the worker crashes/terminates with this error

Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/celery/worker/worker.py", line 208, in start self.blueprint.start(self) File "/home/airflow/.local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/home/airflow/.local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start return self.obj.start() File "/home/airflow/.local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 318, in start blueprint.start(self) File "/home/airflow/.local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start step.start(parent) File "/home/airflow/.local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 599, in start c.loop(*c.loop_args()) File "/home/airflow/.local/lib/python3.6/site-packages/celery/worker/loops.py", line 83, in asynloop next(loop) File "/home/airflow/.local/lib/python3.6/site-packages/kombu/asynchronous/hub.py", line 308, in create_loop events = poll(poll_timeout) File "/home/airflow/.local/lib/python3.6/site-packages/kombu/utils/eventio.py", line 84, in poll return self._epoll.poll(timeout if timeout is not None else -1) File "/home/airflow/.local/lib/python3.6/site-packages/celery/apps/worker.py", line 285, in _handle_request raise exc(exitcode) celery.exceptions.WorkerShutdown: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost human_status(exitcode)), billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM).

and logs from workers are missing? Log file does not exist: /opt/airflow/.... Fetching from: http://airflow-worker-1.airflow-worker.default.svc.cluster.local.... *** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url:....

villasv commented 3 years ago

Never seen it before. My first guess would be low resources.

What EC2 instance sizes are you using? Have you tried using a more capable one?