jostrm / azure-enterprise-scale-ml

Enterprise Scale ML (esml) - on Azure
MIT License
25 stars 7 forks source link

Project: azure-enterprise-scale-ml (ESML) AI Factory

The Enterprise Scale AI Factory is a plug and play solution that automates the provisioning, deployment, and management of AI projects on Azure with a template way of working.

Main purpose:

1) Marry mutliple best practices & accelerators: It reuses multiple existing Microsoft accelerators/landingzone architecture and best practices such as CAF & WAF, and provides an end-2-end experience including Dev,Test, Prod environments.

Public links/blogs for more info / usage

Documentation

The Documentation is organized around ROLES via Doc series.

Doc series Role Focus Details
10-19 CoreTeam Governance Setup of AI Factory. Governance. Infrastructure, networking. Permissions
20-29 CoreTeam Usage User onboarding & AI Factory usage. DataOps for the CoreTeam's data ingestion team
30-39 ProjectTeam Usage Dashboard, Available Tools & Services, DataOps, MLOps, Access options to the private AIFactory
40-49 All FAQ Various frequently asked questions. Please look here, before contacting an ESML AIFactory mentor.

It is also organized via the four components of the ESML AIFactory:

Component In section Focus in section Role Doc series
1) Infra:AIFactory Y - CoreTeam 10-19
2) Datalake template Y - All 20-29,30-39
3) Templates for: DataOps, MLOps, *LLMOps Y - All 20-29, 30-39
4) Accelerators: ESML SDK (Python, PySpark), RAG Chatbot, etc Y - ProjectTeam 30-39

LINK to Documentation

Best practices implemented & benefits

NEWS TABLE

Date Category What Link
2024-03 Automation Add core team member 26-add-esml-coreteam-member.ps1
2024-03 Automation Add project member 26-add-esml-project-member.ps1
2024-03 Tutorial Core-team tutorial 10-AIFactory-infra-subscription-resourceproviders.md
2024-03 Tutorial End-user tutorial 01-jumphost-vm-bastion-access.md
2024-03 Tutorial End-user tutorial 03-use_cases-where_to_start.md
2024-02 Tutorial End-user installation Compute Instance R01-install-azureml-sdk-v1+v2.m
2024-02 Datalake - Onboarding Auto-ACL on PROJECT folder in lakel -
2023-03 Networking No Public IP: Virtual private cloud - updated networking rules https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-secure-workspace-vnet?view=azureml-api-1&preserve-view=true&tabs=required%2Cpe%2Ccli
2023-02 ESML Pipeline templates Azure Databricks: Training and Batch pipeline templates. 100% same support as AML pipeline templates (inner/outer loop MLOps) -
2022-08 ESML infra (IaC) Bicep now support yaml as well -
2022-10 ESML MLOps ESML MLOps v3 advanced mode, support for Spark steps ( Databricks notebooks / DatabrickStep ) -

TEMPLATES for PIPELINES (TRAINING & INFERENCE pipelines) is 1 of 5 template types in ESML:

THE Challenge

Innovating with AI and Machine Learning, multiple voices expressed the need to have an Enterprise Scale AI & Machine Learning Platform with end-2-end turnkey DataOps and MLOps. Other requirements were to have an enterprise datalake design, able to share refined data across the organization, and high security and robustness: General available technology only, vNet support for pipelines & data with private endpoints. A secure platform, with a factory approach to build models.

Even if best practices exists, it can be time consuming and complex to setup such a AI Factory solution, and when designing an analytical solution a private solution without public internet is often desired since working with productional data from day one is common, e.g. already in the R&D phase. Cyber security around this is important.

THE Strategy

To meet the requirements & challenge, multiple best practices needed to be married and implemented, such as: CAF/WAF, MLOps, Datalake design, AI Factory, Microsoft Intelligent Data Platform / Modern Data Architecture. An open source initiative could help all at once, this open-source accelerator Enterprise Scale ML(ESML) - to get an AI Factory on Azure

THE Solution - TEMPLATES & Accelerator

ESML provides an AI Factory quicker (within 4-40 hours), with 1-250 ESMLProjects, an ESML Project is a set of Azure services glued together securely.

ESML 4 main components:

ESML AI Factory - 4 step process:

ESML AI Factory "Oneslider": Dev,Test,Prod environments - Enterprise Scale LandingZones

The Azure Devops/BICEP can optionally integrate with ITSM system as a "ticket" in ServiceNow/Remedy/JIRA Service Desk. The below info is needed for the ESML provisioning:

ESML Architecture - "Modern data analytics platform"

Based on this reference architecture: https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-databricks-modern-analytics-architecture

Contributing to ESML AIFactory?

This repository is a push-only mirror. Ping Joakim Åström for contributions / ideas.

Since "mirror-only" design, Pull requests are not possible, except for ESML admins. See LICENCE file (open source, MIT license) Speaking of open source, contributors:

Q: Is this for you? DataOps married with MLOps? Whats the benefits of the ESML Controlplane SDK?

INTRO - Is this for you: refine data? AutoML or manual ML? R&D phase?

Q1:I want to use Azure AutoML, with MLOps ready to be turned ON , with datalake design automatically generated for me, including BRONZE, SILVER, GOLD concept

Q2:I want to do ML, but only R&D phase - I don't need MLOps or DEV,TEST, PROD environments. Can I still get benefits of ESML - get a quick DEV env & AutoLake?

Q3:I want to do ML, but NOT AutoML - just scikit learn, my own model. Can I still leverage ESML, besides training step?

Q: How was this accelerator born, and what is it based on? It this for me?

Note: You can use this for any enterprise grade solution in need of single or multi-subscription solutions, with an enterprise datalake need, DEV only need, or DEV->TEST->PROD need.

Q6 ESML AI Factory: Can I just use the Azure ML SDK directly? Instead of the ESML SDK?

ESMLPipelineFactory

Azure ML is great, it improves pipeline creation with 90% fewer lines of code to https://azure.microsoft.com/en-us/services/machine-learning/#features

I love when I get asked to push the boundries, and asks where dropping in from multiple places:

WHAT is ESML Autolake™ ( Azure Datalake Storage GEN2 accelerator)

MLOps - Example: compare model in DEV subscription with TEST subscription

TEST_SET Scoring to Azure ML Studio, as TAGS

Scoring Drift / Concept Drift to promote newly trained model (also as step in ESML MLOps pipeline)

Settings

Project/Model settings

The BEST Model - according to YOU": Model_settings

DEPLOY to AKS - realtime scoring