I'm primarily filing this issue to document a solution, but I'd love input if a better solution exists.
I'm combining five of the containers in docker-compose-CeleryExecutor.yml with a worker installed on the base OS of a GPU machine to allow execution of bash commands. I eventually plan to move to a cloud hosted database, more workers, etc. The main point is that we're running Airflow in a heterogeneous environment where different workers will have varying environments.
I verified that the worker was communicating with Redis and Postgres. I could also successfully kick off tasks from the Airflow CLI on the worker machine, i.e.,
airflow run example_bash_operator also_run_this 2018-01-01
would complete successfully. However, kicking off the full DAG either from the CLI or the web interface would lead to task failures. I found the following in the worker logs
subprocess.CalledProcessError: Command 'airflow run example_bash_operator also_run_this 2018-04-10T17:34:49.664935 --local -sd /usr/local/airflow/dags/example_bash_operator.py' returned non-zero exit status 1
This was suspicious because my local DAG path was set to
/local/mldev_runtime/airflow/dags/.
After far too much wasted time, I realized that the Celery executor was generating the absolute DAG path from the executor container and passing this back in its command. One can solve this by creating a symbolic ink.
ln -s /home/mldev_runtime/airflow/dags /usr/local/airflow/dags
I'm primarily filing this issue to document a solution, but I'd love input if a better solution exists. I'm combining five of the containers in docker-compose-CeleryExecutor.yml with a worker installed on the base OS of a GPU machine to allow execution of bash commands. I eventually plan to move to a cloud hosted database, more workers, etc. The main point is that we're running Airflow in a heterogeneous environment where different workers will have varying environments.
I verified that the worker was communicating with Redis and Postgres. I could also successfully kick off tasks from the Airflow CLI on the worker machine, i.e.,
airflow run example_bash_operator also_run_this 2018-01-01
would complete successfully. However, kicking off the full DAG either from the CLI or the web interface would lead to task failures. I found the following in the worker logssubprocess.CalledProcessError: Command 'airflow run example_bash_operator also_run_this 2018-04-10T17:34:49.664935 --local -sd /usr/local/airflow/dags/example_bash_operator.py' returned non-zero exit status 1
This was suspicious because my local DAG path was set to/local/mldev_runtime/airflow/dags/
. After far too much wasted time, I realized that the Celery executor was generating the absolute DAG path from the executor container and passing this back in its command. One can solve this by creating a symbolic ink.ln -s /home/mldev_runtime/airflow/dags /usr/local/airflow/dags