tomasfarias / airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
https://airflow-dbt-python.readthedocs.io
MIT License
170 stars 35 forks source link

Extras installation is broken since v1.0.0 #101

Closed adamantike closed 1 year ago

adamantike commented 1 year ago

Description

With the changes introduced in v1.0.0 (specifically, in commit https://github.com/tomasfarias/airflow-dbt-python/commit/06e60258e9e1ff5e028b72047c406747b8f36924), specifying this package's extras does not install the optional dependencies.

It gets fixed when adding these lines back to the [tool.poetry.dependencies] section in pyproject.toml:

dbt-postgres = { version = ">=1.0.0", optional = true }
dbt-redshift = { version = ">=1.0.0", optional = true }
dbt-snowflake = { version = ">=1.0.0", optional = true }
dbt-bigquery = { version = ">=1.0.0", optional = true }
dbt-spark = { version = ">=1.0.0", optional = true }

This makes sense according to Poetry documentation:

The dependencies specified for each extra must already be defined as project dependencies. Dependencies listed in dependency groups cannot be specified as extras.

However, I haven't created a PR for this, because I see that commit also created a Poetry adapters group, which could get broken with this change. Related Poetry issue: https://github.com/python-poetry/poetry/issues/4891#issuecomment-993816696

Steps to reproduce

  1. Create a new virtualenv.
  2. Run pip install 'airflow-dbt-python[snowflake] == 1.0.0'.
  3. Check for the presence of dbt-snowflake, by running pip freeze | grep dbt-snowflake.

Additional context

Installing version 0.15.3 correctly installs provided extras:

$ pip install 'airflow-dbt-python[snowflake] == 0.15.3'
(...)
Successfully installed Babel-2.12.1 Jinja2-3.1.2 MarkupSafe-2.1.2 SecretStorage-3.3.3 agate-1.7.0 airflow-dbt-python-0.15.3 asn1crypto-1.5.1 attrs-22.2.0 betterproto-1.2.5 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-2.1.1 click-8.1.3 colorama-0.4.6 cryptography-39.0.2 dbt-core-1.4.5 dbt-extractor-0.4.1 dbt-snowflake-1.4.2 filelock-3.10.0 future-0.18.3 grpclib-0.4.3 h2-4.1.0 hologram-0.0.15 hpack-4.0.0 hyperframe-6.0.1 idna-3.4 importlib-metadata-6.1.0 isodate-0.6.1 jaraco.classes-3.2.3 jeepney-0.8.0 jsonschema-3.2.0 keyring-23.13.1 leather-0.3.4 logbook-1.5.3 mashumaro-3.3.1 minimal-snowplow-tracker-0.0.2 more-itertools-9.1.0 msgpack-1.0.5 multidict-6.0.4 networkx-2.8.8 oscrypto-1.3.0 packaging-23.0 parsedatetime-2.4 pathspec-0.10.3 pyOpenSSL-23.0.0 pycparser-2.21 pycryptodomex-3.17 pyjwt-2.6.0 pyrsistent-0.19.3 python-dateutil-2.8.2 python-slugify-8.0.1 pytimeparse-1.1.8 pytz-2022.7.1 pyyaml-6.0 requests-2.28.2 six-1.16.0 snowflake-connector-python-3.0.1 sqlparse-0.4.3 stringcase-1.2.0 text-unidecode-1.3 typing-extensions-4.5.0 urllib3-1.26.15 werkzeug-2.2.3 zipp-3.15.0

$ pip freeze | grep dbt-snowflake
dbt-snowflake==1.4.2

For version 1.0.0, that's no longer the case:

$ pip install 'airflow-dbt-python[snowflake] == 1.0.0'
(...)
Successfully installed Babel-2.12.1 Jinja2-3.1.2 MarkupSafe-2.1.2 agate-1.7.0 airflow-dbt-python-1.0.0 attrs-22.2.0 betterproto-1.2.5 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-3.1.0 click-8.1.3 colorama-0.4.6 dbt-core-1.4.5 dbt-extractor-0.4.1 future-0.18.3 grpclib-0.4.3 h2-4.1.0 hologram-0.0.15 hpack-4.0.0 hyperframe-6.0.1 idna-3.4 isodate-0.6.1 jsonschema-3.2.0 leather-0.3.4 logbook-1.5.3 mashumaro-3.3.1 minimal-snowplow-tracker-0.0.2 msgpack-1.0.5 multidict-6.0.4 networkx-2.8.8 packaging-23.0 parsedatetime-2.4 pathspec-0.10.3 pycparser-2.21 pyrsistent-0.19.3 python-dateutil-2.8.2 python-slugify-8.0.1 pytimeparse-1.1.8 pytz-2022.7.1 pyyaml-6.0 requests-2.28.2 six-1.16.0 sqlparse-0.4.3 stringcase-1.2.0 text-unidecode-1.3 typing-extensions-4.5.0 urllib3-1.26.15 werkzeug-2.2.3

$ pip freeze | grep dbt-snowflake
(no output)
tomasfarias commented 1 year ago

Thanks for the detailed report. I've opened a PR to address this issue: we can keep both the dependency group for tests/development and re-add the adapters back to the main dependencies group.

A bit of history on 06e6025:

The idea of dropping adapters from dependencies was to simplify version resolution: Airflow and dbt-core already contain a lot of dependencies and adding adapters on top of that was making things worse.

However, now that we have dropped support for Airflow 1, I'm not seeing any version conflicts, except for having to set an upper boundary on Python <4.0 (see bottom for a note on upper boundaries). And version resolution is not too bad (took about 3.5 minutes on my machine, when it used to take >40 minutes when supporting Airflow 1). I've been playing around with pdm as of late, so I'll test out if migrating to that can result in quicker version resolution.

In retrospective, I do regret including adapters as dependencies in the first place: I intended them as a shortcut when installing airflow-dbt-python, but I didn't realize at the time that we were placing the burden of resolving dependencies on airflow-dbt-python. Sometimes we can run airflow-dbt-python even with conflicts, as the conflict could be in dbt-core's CLI logic or other parts we don't access (I wish dbt-core offered a light library alternative, there are some ongoing efforts apparently, and dbt-core is particularly aggressive with pinning its dependencies sometimes). So it doesn't make much sense for us to carry that burden.

However, it has become a bit of a feature at this point and I rather not break existing users while it's not a big problem. That being said, as a feature, this is lacking proper testing...

A note on upper boundaries: I dislike upper boundaries except when completely necessary, even on something still far into the future as Python <4.0, but I recognize it's a small cost to pay for now.

As an example, before this PR we only had one upper boundary on pytest-postgresql = ">=3,<4" as newer versions of pytest-postgresql break our tests because they rely on psycopg3.