microsoft / mlops-aisearch-pull

A template that shows how to setup MLOps in Azure AI Search using pull approach
MIT License
3 stars 1 forks source link

MLOps Template for Azure AI Search: Pull approach

This repository demonstrates how to implement a Machine Learning Development and Operations (MLOps) process for Azure AI Search applications that use a pull model to index data. It creates an indexer with two custom skills that pull pdf documents from a blob storage container, chunks them, creates embeddings for the chunks and then adds the chunks into an index. Finally, it performs search evaluation for a collection of data and uploads the results to an AI Studio project so that evaluations can be compared across multiple runs to continue improving the custom skills.

Technical Requirements

Folder Structure

Below are some key folders within the project:

Additionally, the root folder contains some important files:

Local Execution

The deployment scripts and github workflows use the git branch name to create a unique naming scheme for all of the deployed entities.

Configuration

Upload test data

Sample pdfs are available in data to use for indexer testing. To upload the data to blob storage, use the following:

python -m mlops.deployment_scripts.upload_data

Deploy Skillset Functions

The following deployment script will deploy the custom skillset functions to a function app deployment slot and poll the functions until they are ready to be tested:

python -m mlops.deployment_scripts.deploy_azure_functions

To test the two skillset functions after they are deployed, run the following script:

python -m mlops.deployment_scripts.run_functions

More information aboud local development of skillset functions can be found in the custom skills readme.

Deploy Indexer

An indexer is composed for four entities: index, datasource, skillset, and indexer. The configuration for each is defined by the files in mlops/acs_config. To deploy the indexer and commence indexing the data in blob storage, run the following:

python -m mlops.deployment_scripts.build_indexer

Perform Search Evaluation

This will perform search evaluation and upload the result to the AI Studio project specified. For more information about evaluation, see the search evaluation readme.

python -m mlops.evaluation.search_evaluation --gt_path "./mlops/evaluation/data/search_evaluation_data.jsonl" --semantic_config my-semantic-config`

Cleanup Deployment

Since the git branch name was used to create the deployed entities, this deployment script will clean up everything by deleting the deployment slot in the function app and the indexer entities.

python -m mlops.deployment_scripts.cleanup_pr

DevOps Pipelines

This project contains github workflows for PR validation and Continuous Integration (CI).

The PR workflow executes quality checks using flake8 and unit tests. It then deploys the skillset functions to a deployment slot of the function app. Once the functions are deployed and tested, an indexer is deployed and all of the test data is ingested from blob storage. Search evaluation is run and uploaded to an AI Studio project.

The CI workflow executes a similar workflow to the PR workflow, but the skillset functions are deployed to the main function app, not a deployment slot.

In order for the cleanup step of the CI Workflow to work correctly, the development branch from a pull request must not be deleted until the cleanup step has run.

Some variables and secrets should be provided to execute the github workflows (primarily the same ones used in the .env file for local execution).

Related Projects

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.