This repository contains the code and infrastructure for the Totesys Data Engineering project, which aims to build a reliable and resilient data pipeline to extract, transform, and load data from an operational database into a data lake and data warehouse hosted on AWS.
Please find the documentation for the project and code here
The primary objective of this project is to showcase skills and knowledge in Python, SQL, database modeling, AWS, operational practices, and Agile methodologies. The project involves the following key components:
totesys
database and stores it in an S3 "ingestion" bucket.The flow diagram of the step function is shown below.
The repository is organized as follows:
.
├── Makefile
├── README.md
├── conventions
│ ├── ci-cd.md
│ ├── code-review.md
│ ├── docs-and-comments.md
│ ├── images
│ ├── pull-request.md
│ ├── terraform.md
│ └── testing.md
├── db
│ ├── connection.py
│ ├── data
│ ├── run_schema.py
│ ├── run_seed.py
│ ├── schema.sql
│ └── seed.py
├── dev-db-terraform
│ ├── dev_db.tf
│ ├── main.tf
│ └── ...
├── python
│ ├── src
│ └── tests
├── requirements.in
├── specifications
│ ├── Deliverance_ETL_architecture_diagram.png
│ ├── Deliverance_ETL_architecture_diagram.svg
│ ├── S3_Data_Storage_Specification.md
│ ├── ingestion_lambda_spec.md
│ ├── project_plan.md
│ ├── specifiction.md
│ └── processing_lambda_spec.md
└── terraform
├── data.tf
├── dev.tfvars
├── eventbridge.tf
├── iam.tf
├── lambda.tf
├── main.tf
├── prod.tfvars
├── s3.tf
├── test.tfvars
└── variables.tf
terraform/
: Contains Terraform configuration files for provisioning the AWS infrastructure.python/
: Contains the source code for the Python Lambda functions responsible for data ingestion, processing, and loading. Includes unit tests..github/workflows/
: Contains GitHub Actions workflows for continuous integration and deployment.README.md
: This file, providing an overview of the project and instructions for setup and deployment.To get started with the project, follow these steps:
git clone https://github.com/your-username/totesys-data-engineering.git
terraform/
directory to match your AWS account and desired settings.terraform init
and terraform apply
..github/workflows/
directory.Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.
For more information refer to the documentation.