Using Airflow to implement our ETL pipelines.
There are several tools available to create a virtual environment in Python.
Below are the steps to manage a virtual environment using venv
:
Create a Virtual Environment
To create a virtual environment, run the following command:
python -m venv venv
In this example, venv
is the name of the virtual environment directory, but you can replace it with any name you prefer.
Activate the Virtual Environment
After creating the virtual environment, activate it using the following command:
source venv/bin/activate
Install Dependencies
After activating the virtual environment, you can install the required dependencies:
# Install airflow and dev dependencies
pip install -r requirements.txt -r requirements-dev.txt -c constraints-3.8.txt
# black is conflict with click, so install it separately
pip install black==19.10b0 click==7.1.2
Deactivate the Virtual Environment
When you're done working in the virtual environment, you can deactivate it with:
deactivate
For development or testing, run cp .env.template .env.staging
. For production, run cp .env.template .env.production
.
Follow the instructions in .env.<staging|production>
and fill in your secrets.
If you are running the staging instance for development as a sandbox and do not need to access any specific third-party services, leaving .env.staging
as-is should be fine.
Contact the maintainer if you don't have these secrets.
⚠ WARNING: About .env Please do not use the .env file for local development, as it might affect the production tables.
Set up the Authentication for GCP: https://googleapis.dev/python/google-api-core/latest/auth.html
*After running gcloud auth application-default login
, you will get a credentials.json file located at $HOME/.config/gcloud/application_default_credentials.json
. Run export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
if you have it.
If you are a developer 👨💻, please check the Contributing Guide.
If you are a maintainer 👨🔧, please check the Maintenance Guide.
For development/testing:
# Build the local dev/test image
make build-dev
# Start dev/test services
make deploy-dev
# Stop dev/test services
make down-dev
The difference between production and dev/test compose files is that the dev/test compose file uses a locally built image, while the production compose file uses the image from Docker Hub.
If you are a authorized maintainer, you can pull the image from the GCP Artifact Registry.
Docker client must be configured to use the GCP Artifact Registry.
gcloud auth configure-docker asia-east1-docker.pkg.dev
Then, pull the image:
docker pull asia-east1-docker.pkg.dev/pycontw-225217/data-team/pycon-etl:{tag}
There are several tags available:
cache
: cache the image for faster deploymenttest
: for testing purposes, including the test dependenciesstaging
: when pushing to the staging environmentlatest
: when pushing to the production environmentPlease check the Production Deployment Guide.