puckel / docker-airflow

Docker Apache Airflow
Apache License 2.0
3.77k stars 541 forks source link

attach debugger to celery workers #223

Open melchoir55 opened 6 years ago

melchoir55 commented 6 years ago

I'm trying to develop operators with interdependencies due to xcom. As such, I need to run them in the context of a dag in order for them to make sense. I would like to run the code with a debugger attached for breakpoints and etc. I've been using the celery image because it closely mirrors production. Unfortunately, at least in pycharm, I'm having a ton of difficulty actually getting a debugger attached. My workflow has been to bring up the compose in pycharm attached to the worker, then using the restful api to kick off a dag. Unfortunately, I can't seem to trip breakpoints.

Do you recommend working outside of the celery context for this sort of work? If so, how do you recommend attaching a debugger to whatever is executing the operator code?

jakubvedral commented 5 years ago

Hi, did you find how to make it work? I am having a similar problem.

jackwellsxyz commented 4 years ago

Hi @puckel, @jakubvedral, @melchoir55 - the issue is that PyCharm passes the entire Airflow command to the entrypoint script, whereas the docker-airflow entrypoint only wants the airflow subcommand (e.g. scheduler) passed. For example, when you run an Airflow docker-compose script and want to override the scheduler to debug it, PyCharm runs this:

/usr/local/bin/docker-compose -f /Users/myuser/dev/data/docker-airflow/docker-compose.yml -f /Users/myuser/Library/Caches/PyCharm2019.3/tmp/docker-compose.override.108.yml up --exit-code-from scheduler --abort-on-container-exit scheduler

And when you dig in to the docker-compose.override.108.yml file, it sends this as the CMD. Here's that file:

version: "2.1"
services:
  scheduler:
    command:
    - "python"
    - "-u"
    - "/opt/.pycharm_helpers/pydev/pydevd.py"
    - "--multiprocess"
    - "--qt-support=auto"
    - "--port"
    - "51994"
    - "--file"
    - "/usr/local/bin/airflow"
    - "scheduler"
    environment:
      PYTHONPATH: "/opt/project:/opt/.pycharm_helpers/pycharm_matplotlib_backend:/opt/.pycharm_helpers/pycharm_display:/opt/.pycharm_helpers/third_party/thriftpy:/opt/.pycharm_helpers/pydev:/Users/myuser/Library/Caches/PyCharm2019.3/cythonExtensions"
      PYTHONIOENCODING: "UTF-8"
      IDE_PROJECT_ROOTS: "/opt/project"
      IPYTHONENABLE: "True"
      PYTHONDONTWRITEBYTECODE: "1"
      LOAD_EX: "n"
      PYDEVD_LOAD_VALUES_ASYNC: "True"
      PYTHONUNBUFFERED: "1"
      LIBRARY_ROOTS: "/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/1107771722:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/201544331:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/724150857:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/-154863933:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/-2006528493:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/1797547020:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/-1227933812:/Users/myuser/Library/Caches/PyCharm2019.3/remote_sources/1007075957/-125940560:/Users/myuser/Library/Caches/PyCharm2019.3/python_stubs/1007075957:/Applications/PyCharm.app/Contents/plugins/python/helpers/python-skeletons:/Applications/PyCharm.app/Contents/plugins/python/helpers/typeshed/stdlib/2:/Applications/PyCharm.app/Contents/plugins/python/helpers/typeshed/stdlib/2and3:/Applications/PyCharm.app/Contents/plugins/python/helpers/typeshed/third_party/2:/Applications/PyCharm.app/Contents/plugins/python/helpers/typeshed/third_party/2and3"
      EXECUTOR: "Celery"
      PYCHARM_HOSTED: "1"
      PYCHARM_DISPLAY_PORT: "63342"
    ports:
    - "0.0.0.0:51994:51994"
    stdin_open: true
    volumes:
    - "/Users/myuser/dev/data/docker-airflow/vigil-airflow:/opt/project:rw"
    volumes_from:
    - "container:dffa60cfc6831757f8c6d398ca577f14aedb7be1edabd72cdf6ccce840ef725a:ro"
    working_dir: "/opt/project"

Essentially, you'd need to refactor the entrypoint script to accept an actual command instead of an option for airflow. I'm happy to create a PR for this - though since it could well break airflow for other users, it might be cleaner to give users an option to pick the right entrypoint script when launching the container.

wittfabian commented 4 years ago

Maybe the Debug Executor is what you are looking for. See: https://airflow.apache.org/docs/1.10.10/executor/debug.html