puckel / docker-airflow

Docker Apache Airflow
Apache License 2.0
3.77k stars 542 forks source link

Task failures when combining scheduler container with locally installed worker. #170

Open mhousley opened 6 years ago

mhousley commented 6 years ago

I'm primarily filing this issue to document a solution, but I'd love input if a better solution exists. I'm combining five of the containers in docker-compose-CeleryExecutor.yml with a worker installed on the base OS of a GPU machine to allow execution of bash commands. I eventually plan to move to a cloud hosted database, more workers, etc. The main point is that we're running Airflow in a heterogeneous environment where different workers will have varying environments.

I verified that the worker was communicating with Redis and Postgres. I could also successfully kick off tasks from the Airflow CLI on the worker machine, i.e., airflow run example_bash_operator also_run_this 2018-01-01 would complete successfully. However, kicking off the full DAG either from the CLI or the web interface would lead to task failures. I found the following in the worker logs subprocess.CalledProcessError: Command 'airflow run example_bash_operator also_run_this 2018-04-10T17:34:49.664935 --local -sd /usr/local/airflow/dags/example_bash_operator.py' returned non-zero exit status 1 This was suspicious because my local DAG path was set to /local/mldev_runtime/airflow/dags/. After far too much wasted time, I realized that the Celery executor was generating the absolute DAG path from the executor container and passing this back in its command. One can solve this by creating a symbolic ink. ln -s /home/mldev_runtime/airflow/dags /usr/local/airflow/dags

ValeraKaravai commented 6 years ago

I have the same problem Does anybody know a solution?