tomasfarias / airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
https://airflow-dbt-python.readthedocs.io
MIT License
170 stars 35 forks source link

fix: Avoid heavy top-level imports in operators #85

Closed adamantike closed 1 year ago

adamantike commented 1 year ago

For complex DAGs, the import time added by this library could generate DagBag import timeouts if the configured value is low enough. An Airflow documented best practice is to avoid heavy Python code that runs on DAG and Operator creation, and dbt imports are slow, based on profiling.

Profiling can be easily run locally, with the following command:

python -X importtime -c "from airflow_dbt_python.operators.dbt import DbtRunOperator" 2>import-times.log

And then parsed using a tool like tuna.

Before this change, the operator import takes ~1.37s, which is reduced to ~0.25s with this fix.

It's important to note that, from those 0.25s, more than 80% of the time is spent importing airflow components, which will be commonly already loaded in DAGs, so this library's import time for operators becomes insignificant.

tomasfarias commented 1 year ago

Changes look neat, thanks for your contribution.

and dbt imports are slow, based on profiling

Fun story: sometimes dbt does network calls (!) when importing modules. This was a particular headache for me when I was trying to fetch the dbt version fast.

Anyways, if CI is green, I'll be happy to merge this and push out a patch release.

adamantike commented 1 year ago

Thank you for the quick review and release! You have done an excellent job building and supporting this project :)