microsoft / dstoolkit-mlops-v2

This repository contains the basic repository structure for machine learning projects based on Azure technologies (Azure ML and Azure DevOps).
Other
20 stars 10 forks source link

Introducing Model Factory

About this repo

The idea of this template is to provide a minimum number of scripts to implement development environment to train new models using Azure ML SDK v2 With Azure DevOps or Github Actions.

The template contains the following folders/files:

The template contains the following documents:

How to use the repo

Information about how to setup the repo is in the following document.

Local Execution

You can start training pipelines from your local computer by creating an environment based on the following instructions:

Caching Python Dependencies

Caching is used to store Python dependencies to improve build times by reusing packages between runs. The cache is managed using the Cache@2 task in the pipeline.

An example of how caching is implemented in this repo can be found in build_validation_pipeline.yml.

Understanding Cache Key, Cache Path, and Restore Keys

python_build_validate | "$(Agent.OS)" | .azure-pipelines/requirements/build_validation_requirements.txt`
python_build_validate | "$(Agent.OS)"`.

Variables Used

Running Debug Tasks in VS Code

You can use Visual Studio Code to run and debug specific tasks related to the MLOps pipelines. The following configurations are set up in the launch.json file, allowing you to execute various scripts with ease.

Available Debug Tasks

  1. Register Data Asset

    • Command: python -m mlops.common.register_data_asset --data_config_path config/data_config.json
    • Description: Registers a data asset using the provided configuration file.
  2. Start NYC Taxi Local Pipeline

    • Command: python -m mlops.nyc_taxi.start_local_pipeline --build_environment=<environment> --wait_for_completion=<True/False>
    • Description: Starts the NYC Taxi pipeline in a local environment. You will be prompted to specify the build_environment and whether the pipeline should wait for completion.
  3. Start London Taxi Local Pipeline

    • Command: python -m mlops.london_taxi.start_local_pipeline --build_environment=<environment> --wait_for_completion=<True/False>
    • Description: Starts the London Taxi pipeline in a local environment. You will be prompted to specify the build_environment and whether the pipeline should wait for completion.

How to Run

  1. Open the Debug panel in Visual Studio Code.
  2. Select the desired debug task from the dropdown list. The options are:
    • Register Data Asset
    • Start NYC Taxi Local Pipeline
    • Start London Taxi Local Pipeline
  3. Click the green play button () next to the dropdown to start the task.
  4. For the NYC Taxi and London Taxi pipelines, you will be prompted to enter two values:
    • Build Environment: Choose from pr, dev, or any other configured environments.
    • Wait for Completion: Choose True if you want the pipeline to wait for completion before exiting, or False to allow it to run asynchronously.
  5. The output and any debugging information will be displayed in the Debug Console or Integrated Terminal, depending on the task configuration.

Build Validation Policies for Azure Repos Git

Limitation in Azure DevOps Pipelines

Azure Pipelines support PR triggers in YAML configuration when the repository is hosted on GitHub, but not when the repository is hosted on Azure Repos Git. In other words, using the pr: section in YAML files works for GitHub repos, but will not work for Azure Repos Git.

Example of a PR trigger that works in GitHub, but not in Azure Repos:

pr:
  - master
  - develop

This limitation means that maintainers need to rely on branch policies in Azure Repos Git to enforce build validation, rather than configuring this directly in the pipeline YAML file.

Community Issue

The issue has been raised by the community, noting that Azure DevOps doesn’t support this PR trigger feature natively, which forces users to manage branch policies through the Azure DevOps UI rather than in YAML configuration. This presents an additional administrative burden as maintainers need to manage both YAML pipeline definitions and non-configuration-based policies.

The community issue and thread discussion can be found here.

Alternative: Branch Policies for Build Validation

To enforce build validation in Azure Repos, branch policies provide a robust alternative. These policies allow more configuration options and are essential for protecting branches with mandatory builds before merging pull requests.

Follow the steps outlined in the Azure DevOps Branch Policies documentation to set up branch policies for build validation:

Note: You need to have appropriate permissions to create Build Validation Policies.

  1. Ensure you have created the Pipeline prior to creating the Build Validation Policy.

  2. Navigate to Branch Policies:

    • Go to your Azure DevOps "Project Settings".
    • Navigate to Repos > Repositories.
    • Select the Repository from the list.
    • Select the "Policies" tab.
    • Find the "Branch Policies" section and select the branch you want to set the policy for (e.g., development).
  3. Add Build Validation:

    • Under "Build validation", click "+" to add a build policy.
    • Select the pipeline you want to run when a PR is created or updated.
    • Configure the policy settings, such as requiring the build to pass before completing the PR.
    • Fill in the optional box for the paths (see the Paths section for more information).
    • Click "Save".

For more information about Build Validation Policies, please see the documentation.

Set Path Policies

Each policy contains a list of paths that specify which files or directories should trigger the pipeline when changed. By defining these paths, we ensure that only the necessary pipelines are executed, reducing unnecessary builds and tests, and speeding up the overall CI/CD process. For example:

Notes

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.