Open velebak opened 4 years ago
Hi @velebak, if the task is queued on SQS, it should be a matter of time for the workers to pick it up. It might take a while if the minimum number of workers is 0 because autoscaling is a bit slow. If you do have a worker instance already, it probably failed to bootstrap.
Thanks for the reply. That's probably what it is wrt the worker node not bootstrapping properly. I have some more digging to do there. Also, this project is awesome. Keep up the good work.
Im still working on this, but want to close the ticket for now as it might be a while before I get back to it.
I ran into a similar issue and this might be what @velebak is experiencing.
On a worker system, this was in the logs:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 37, in <module>
args.func(args)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 75, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1089, in worker
worker = worker.worker(app=celery_app)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 781, in main
with self.make_context(prog_name, args, **extra) as ctx:
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 698, in make_context
ctx = Context(self, info_name=info_name, parent=parent, **extra)
TypeError: __init__() got an unexpected keyword argument 'app'
Running airflow worker
produced the same error. It looks to be caused by this issue: https://github.com/apache/airflow/issues/11301
Celery recently updated to version 5. Uninstalling the version on the worker, which was 5.0.2 (singularity)
according to celery --version
on my test system, and installing 4.4.7 worked. The service immediately started back up and kicked off tasks.
sudo pip3 uninstall celery
sudo pip3 install celery==4.4.7
Urgh, unpinned version requirements. Thanks for digging into it!
I deployed this cloudformation without changes, and successfully got to the dashboard after adding a security group change.
I was able to write and deploy my own DAG, and schedule it. It only queues the work, (evidenced by the SQS messages waiting) but the workers never actually pick up and execute the job.
Are there other instructions that have been overlooked to get workers to actually run work. DAG works fine on a local single node airflow instance.
TIA