In the real world, many problems can be too complex to be solved by a single machine learning model. Whether that be predicting sales for each individual store, building a predictive maintanence model for hundreds of oil wells, or tailoring an experience to individual users, building a model for each instance can lead to improved results on many machine learning problems.
This Pattern is very common across a wide variety of industries and applicable to many real world use cases. Below are some examples we have seen where this pattern is being used.
Energy and utility companies building predictive maintenance models for thousands of oil wells, hundreds of wind turbines or hundreds of smart meters
Retail organizations building workforce optimization models for thousands of stores, campaign promotion propensity models, Price optimization models for hundreds of thousands of products they sell
Restaurant chains building demand forecasting models across thousands of restaurants
Banks and financial institutes building models for cash replenishment for ATM Machine and for several ATMs or building personalized models for individuals
Enterprises building revenue forecasting models at each division level
Document management companies building text analytics and legal document search models per each state
Azure Machine Learning (AML) makes it easy to train, operate, and manage hundreds or even thousands of models. This repo will walk you through the end to end process of creating a many models solution from training to scoring to monitoring.
To use this solution accelerator, all you need is access to an Azure subscription and an Azure Machine Learning Workspace that you'll create below.
While it's not required, a basic understanding of Azure Machine Learning will be helpful for understanding the solution. The following resources can help introduce you to AML:
Start by deploying the resources to Azure. The button below will deploy Azure Machine Learning and its related resources:
Next you'll need to configure your development environment for Azure Machine Learning. We recommend using a Notebook VM as it's the fastest way to get up and running. Follow the steps in EnvironmentSetup.md to create a Notebook VM and clone the repo onto it.
Once your development environment is set up, run through the Jupyter Notebooks sequentially following the steps outlined. By the end, you'll know how to train, score, and make predictions using the many models pattern on Azure Machine Learning.
There are two ways to train many models:
However, the steps needed to set the workspace up and prepare the datasets are the same no matter which option you choose.
In this repo, you'll train and score a forecasting model for each orange juice brand and for each store at a (simulated) grocery chain. By the end, you'll have forecasted sales by using up to 11,973 models to predict sales for the next few weeks.
The data used in this sample is simulated based on the Dominick's Orange Juice Dataset, sales data from a Chicago area grocery store.
The functionality is broken into the notebooks folders designed to be run sequentially.
Notebook | Description |
---|---|
00_Setup_AML_Workspace.ipynb |
Creates and configures the AML Workspace, including deploying a compute cluster for training. |
01_Data_Preparation.ipynb |
Prepares the datasets that will be used during training and forecasting. |
The following notebooks are located under the Custom_Script/
folder.
Notebook | Description |
---|---|
02_CustomScript_Training_Pipeline.ipynb |
Creates a pipeline to train a model for each store and orange juice brand in the dataset using a custom script. |
03_CustomScript_Forecasting_Pipeline.ipynb |
Creates a pipeline to forecast future orange juice sales using the models trained in the previous step. |
The following notebooks are located under the Automated_ML/
folder.
Notebook | Description |
---|---|
02_AutoML_Training_Pipeline.ipynb |
Creates a pipeline to train a model for each store and orange juice brand in the dataset using Automated ML. |
03_AutoML_Forecasting_Pipeline.ipynb |
Creates a pipeline to forecast future orange juice sales using the models trained in the previous step. |
Watch these how-to-videos for a step by step walk-through of the many model solution accelerator to learn how to setup your models using both the custom training script and Automated ML.
ParallelRunStep enables the parallel training of models and is commonly used for batch inferencing. This document walks through some of the key concepts around ParallelRunStep.
Pipelines allow you to create workflows in your machine learning projects. These workflows have a number of benefits including speed, simplicity, repeatability, and modularity.
Automated Machine Learning also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
In additional to ParallelRunStep, Pipelines and Automated Machine Learning, you'll also be working with the following concepts including workspace, datasets, compute targets, python script steps, and Azure Open Datasets.
This project welcomes contributions and suggestions. To learn more visit the contributing section.
Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.