unna97 / audio-annonation

Django app that allows you to annotate audio.
MIT License
1 stars 1 forks source link

Migration to Poetry for Package Management #31

Closed christopherkeim closed 8 months ago

christopherkeim commented 9 months ago

Description

This migrates the application's package management to use Poetry 1.5.1 and adds infrastructure for automated development environment setup for Ubuntu 20.04/22.04 machines.

Poetry

Maintains parity with requirements.txt - two optional dependency groups have been created for the psycopg2 PostgreSQL Python module.

The dev dependency group includes development modules for linting, formatting, and testing, as well as the precompiled psycopg2 binary for interfacing with PostgreSQL databases.

The prod dependency group includes the production version of psycopg2 which will be build from source. This is primarily for Docker image builds, where we will configure the image with dependencies to build psycopg2.

For development, packages can be installed with:

poetry install --with dev

For production (most likely in the Docker build), dependencies can be installed with:

poetry install --with prod

Automated Development Environment Setup (Ubuntu 20.04/22.04)

The setup.sh script installs and configures:

  1. Python3.11
  2. Poetry 1.5.1
  3. PostgreSQL 16
  4. Docker 24.0.6

Each command in the setup is idempotent.

After cloning this repository, the script can be run with:

bash setup.sh

Motivation and Context

Closes #28

How Has This Been Tested?

Testing on Ubuntu 20.04/22.04 passes with successful installation of system dependencies (Python 3.11, Poetry 1.5.1, PostgreSQL 16, Docker 24.0.6) and Python application dependencies defined in the pyproject.toml file using Poetry.

With these configurations, the server spins up successfully.

Testing with a local database also passes.

Dependencies Added:

The dependencies added during the migration are:

[tool.poetry.group.dev]
optional = true
[tool.poetry.group.dev.dependencies]

# DevOps
black = "^22.3.0"
pytest = "^7.4.0"
pytest-cov = "^4.1.0"
ruff = "^0.0.285"

# PostgresSQL psycopg2 pre-compiled binary
psycopg2-binary = "^2.9.9"

[tool.poetry.group.prod]
optional = true
[tool.poetry.group.prod.dependencies]

# PostgresSQL psycopg2 source build 
psycopg2 = "^2.9.9"

Types of changes

Checklist:

@unna97

unna97 commented 8 months ago

@christopherkeim Thank you so much for this PR. Can you please tag me once you want me to review it?

christopherkeim commented 8 months ago

@christopherkeim Thank you so much for this PR. Can you please tag me once you want me to review it?

Yes absolutely 🚀 .

christopherkeim commented 8 months ago

@christopherkeim If possible can you add windows equivalent set-up files & github actions for tests on each platform?

It's definitely possible to write a PowerShell script to handle automated Windows development environment setup, but there's two considerations to make:

  1. This produces two separate pieces of infrastructure that must be maintained in parallel
  2. Is our deployment target Linux or Windows, or both? For packaging ML/DL services, containerizing them upfront gives us the most amount of flexibility across platforms - the setup.sh script gives us the ability to test source code during development, a prototype for system dependencies, and flexibility for VM deployment, but the end dependency for our server is likely going to be Docker / Docker Compose

In terms of cross-platform CI (Windows and Linux) - it's possible to create a matrix build that would spin up two separate VMs in parallel and test the source code installation, linting, and functionality etc. in the two separate operating systems. In terms of resources this requires:

  1. Two separate pieces of infrastructure (setup.sh and setup.ps) that must be maintained in parallel as above
  2. 2x compute for CI

Overall my recommendation is to maintain as much parity with our deployment target environment as possible, where containerization gives us streamlined flexibility for execution across platforms (Linux, Windows, Darwin) - but it depends on what your aims are:

  1. Linux server deployment OR Windows server deployment (no containerization) - maintain single infrastructure configuration for development and deployment
  2. Linux server deployment AND Windows server deployment (no containerization) - maintain multiple infrastructure configurations for development and deployment (setup.sh and setup.ps which I could pull into our CI matrix build Github Action)
  3. Containerization for platform-independent deployment - maintain single infrastructure configuration for development (injecting this into the image) and streamlined configuration for deployment (Docker / Docker Compose)
unna97 commented 8 months ago
  1. I was hoping to maintain infrastructure for both windows and linux
  2. I believe the deployment target is Linux. But containerizing is important for deployment as I plan on using fly