tomasfarias / airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
https://airflow-dbt-python.readthedocs.io
MIT License
178 stars 36 forks source link

fix: Conditionally import required hook #99

Closed adamantike closed 1 year ago

adamantike commented 1 year ago

This change makes the library only import the hook it's going to need based on the remote scheme.

As it is currently implemented, an environment that uses a S3 remote fails if it doesn't have the ssh Airflow provider installed:

Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/dbt.py", line 126, in get_remote
    return self.remotes[(scheme, conn_id)]
KeyError: ('s3', None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/dbt.py", line 300, in dbt_directory
    project_dir, profiles_dir = self.prepare_directory(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/dbt.py", line 345, in prepare_directory
    project_dir_path = self.download_dbt_project(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/dbt.py", line 158, in download_dbt_project
    remote = self.get_remote(scheme, self.project_conn_id)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/dbt.py", line 128, in get_remote
    remote = get_remote(scheme, conn_id)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/remote.py", line 147, in get_remote
    from .git import DbtGitRemoteHook
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt_python/hooks/git.py", line 5, in <module>
    from airflow.providers.ssh.hooks.ssh import SSHHook
ModuleNotFoundError: No module named 'airflow.providers.ssh'

The same happens when the environment uses a remote that requires the SSH hook, and doesn't install the aws Airflow provider.

tomasfarias commented 1 year ago

Nice, thanks for fixing this one. I'll put out a patch release with tag v1.0.1.

I keep introducing bugs around conditional imports as my dev environment and the test environment tend to have all dependencies, so they never run into issues like this.

I'm wondering if we can configure the test matrix to run under a partial environment to catch bugs like this :thinking: I'll give it a thought over the next few days.

adamantike commented 1 year ago

Glad to add more contributions to this great project! :)

Regarding how to improve coverage to avoid these issues moving forward, I think the CI matrix can be extended to include different extras being installed (being the current CI configuration what we would get when all extras are included).

With that, tests can be gated with pytest.importorskip, to not fail when an expected dependency is installed.

I'll try to implement a simple change to the CI that would have caught this issue.