Stand-alone project that utilises public eCommerce data from Instacart to demonstrate how to schedule dbt models through Airflow.
For more Data & Analytics related reading, check https://analyticsmayhem.com
Change directory within the repository and run docker-compose up
. This will perform the following:
docker-compose.yml
will download the necessary images to run the project. This includes the following services:
docker-compose.yml
Once everything is up and running, navigate to the Airflow UI (see connections above). You will be presented with the list of DAGs, all Off by default.
You will need to run to execute them in correct order.
If everything goes well, you should have the daily model execute successfully and see similar task durations as per below.
Finally, within Adminer you can view the final models.
docker-compose up
or docker-compose up -d
(detatches the terminal from the services' log)docker-compose down
Non-destructive operation.docker-compose rm
Ddeletes all associated data. The database will be empty on next run.docker-compose build
Re-builds the containers based on the docker-compose.yml definition. Since only the Airflow service is based on local files, this is the only image that is re-build (useful if you apply changes on the ./scripts_airflow/init.sh
file. If you need to connect to the running containers, use docker-compose ps
to view the running services.
For example, to connect to the Airflow service, you can execute docker exec -it dbt-airflow-docker_airflow_1 /bin/bash
. This will attach your terminal to the selected container and activate a bash terminal.
Because the project directories (./scripts_postgres
, ./sample_data
, ./dbt
and ./airflow
) are defined as volumes in docker-compose.yml
, they are directly accessible from within the containers. This means:
./dbt
and then dbt compile
docker exec -it dbt-airflow-docker_airflow_1 /bin/bash
. This will open a session directly in the container running Airflow. Then CD into /dbt
and dbt compile
. In general attaching to the container, helps a lot in debugging.dbt compile
them and on the next DAG update they will be available (beware of changes that are major and require --full-refresh
). It is suggested to connect to the container (docker exec ...
) to run a full refresh of the models. Alternatively you can docker-compose down && docker-compose rm && docker-compose up
. ./airflow/dags
stores the DAG files. Changes on them appear after a few seconds in the Airflow admin.
initialise_data.py
file contains the upfront data loading operation of the seed data.dag.py
file contains all the handling of the DBT models. Keep aspect is the parsing of manifest.json
which holdes the models' tree structure and tag detailsCredit to the very helpful repository: https://github.com/puckel/docker-airflow