prio-data / views_pipeline

VIEWS forecasting pipeline for monthly prediction runs. Includes MLops and QA for all models/ensembles.
Other
3 stars 3 forks source link

Issue: Draft ADR 024 - Development and Production Sync #133

Closed marinamatic closed 1 week ago

Polichinel commented 2 weeks ago

ADR: 001 - Development and Production Sync

ADR Info Details
Subject Production and Development Branch Synchronization
ADR Number 024
Status Proposed
Author [Author Name]
Date 31.10.2024

Context

We aim to establish a new benchmark in MLOps for early warning systems (EWS), specifically for conflict forecasting, which demands high standards of reliability, transparency, and seamless update processes. Given the high stakes of forecasting in EWS, the branching strategy must support robust, transparent, and consistent updates, with a focus on ensuring production stability while accommodating active, iterative development.

To support continuous quality assurance, real-time monitoring, and rapid model updates, the synchronization between development and production branches must be structured to maintain reliability and performance while addressing the following critical needs:

This ADR defines the branching and synchronization structure necessary to support these requirements while adhering to MLOps best practices, ensuring the production branch remains stable and reliable for operational forecasting while allowing iterative improvements in development.

Decision

To achieve these requirements, we will implement the following branching and synchronization strategy, optimized for the EWS pipeline:

Branch Structure and Sync Strategy

  1. Primary Branches

    • Production: Serves as the stable branch for all production-ready code and models. Only validated updates are merged here, ensuring production stability for high-stakes decision-making.
    • Development: Acts as the main integration branch for feature development, model updates, and experiment integration. All new features are developed in dedicated feature branches based on this branch and merged via Pull Requests (PRs) to ensure controlled updates and testing.
  2. Feature Branch Workflow

    • Feature branches are created off development for isolated testing of new features, models, or configurations.
    • Each feature branch undergoes rigorous PR reviews and automated testing to ensure compatibility, stability, and performance before merging into development. This approach maintains the stability of development, reducing errors upon merging to production.
  3. Syncing Development to Production

    • Periodic Pull Requests: At regular intervals (between weekly and monthly), development will be merged into production via a Pull Request once a full validation cycle is completed.
    • Staging Environment Validation: A staging environment replicates production settings to validate the integrity of development before merging into production. This includes running inference tests, drift detection, performance checks, and monitoring to detect issues pre-deployment, ensuring production stability.
  4. Hotfix Branches

    • For urgent issues in production, hotfix branches are created directly from production, fixed, tested, and merged back into production. These hotfixes are then backported to development to maintain consistency between branches.
  5. Versioning

    • Semantic Versioning: Each production release is tagged with semantic versioning (e.g., v1.0.0, v1.1.0) to facilitate traceability and rollback.

Consequences

Positive Effects:

Negative Effects:

Rationale

This branching and sync structure balances flexibility in development with reliability in production. By keeping development and production branches separate and introducing a staging validation step, we ensure that production remains stable and capable of handling high-stakes forecasts while enabling iterative development in development. The addition of hotfix branches further reduces the risk of downtime due to critical issues in production.

Considerations

Additional Notes

Feedback and Suggestions

Feedback is welcome on any additional sync requirements, monitoring tools, or branching conventions. Input on optimizing the staging environment and hotfix management process is also appreciated to ensure alignment with best practices.