I am unable to use DbtRunOperator with a private ssh git repo, and while I am unsure if my syntax is correct, I am encountering an error that leads me to believe that it is not my usage of the operator.
We are running
dbt_run = DbtRunOperator(
dbt_conn_id="dbt-projects-github", # Airflow connection to private dbt-airflow github repository
task_id="dbt_run",
project_dir="git+ssh://github.com/OrganizationName/dbt-airflow",
# project_conn_id=db_conn,
select=["+tag:daily"],
exclude=["tag:deprecated"],
target="db_conn", # Airflow Connection to data warehouse
# profile="my-project",
)
which results in the following error:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 325, in dbt_directory
store_profiles_dir,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 369, in prepare_directory
tmp_dir,
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 182, in download_dbt_project
return remote.download_dbt_project(project_dir, destination)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/remote.py", line 73, in download_dbt_project
self.download(source_url, destination_url)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/git.py", line 154, in download
client, path = self.get_git_client_path(source)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/git.py", line 187, in get_git_client_path
path = f"{url.netloc.split(':')[1]}/{str(url.path)}"
IndexError: list index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/operators/dbt.py", line 173, in execute
**vars(self),
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 234, in run_dbt_task
env_vars=env_vars,
File "/usr/local/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow_dbt_python/hooks/dbt.py", line 330, in dbt_directory
) from e
airflow.exceptions.AirflowException: Failed to prepare temporary directory for dbt execution
The url.netloc is github.com, and notably if we passed a github repo url to project_dir that used either git or http/https, then the following code would have run path = str(url.path) rather than path = f"{url.netloc.split(':')[1]}/{str(url.path)}" which appears to be the cause of the error.
Are you able to provide any assistance with this? Also, it would be great while we're struggling through these errors to also receive some feedback on the discussion I opened about this topic as well
Hello, this relates to the discussion I created
I am unable to use
DbtRunOperator
with a private ssh git repo, and while I am unsure if my syntax is correct, I am encountering an error that leads me to believe that it is not my usage of the operator.We are running
which results in the following error:
The
url.netloc
is github.com, and notably if we passed a github repo url toproject_dir
that used eithergit
orhttp/https
, then the following code would have runpath = str(url.path)
rather thanpath = f"{url.netloc.split(':')[1]}/{str(url.path)}"
which appears to be the cause of the error.Are you able to provide any assistance with this? Also, it would be great while we're struggling through these errors to also receive some feedback on the discussion I opened about this topic as well
Thank you very much!