jostrm / azure-enterprise-scale-ml

Enterprise Scale AIFactory (esml) - on Azure
MIT License
34 stars 11 forks source link
ai-ready-infrastructure dataops enterprise-scale-landingzone genai llmops mlops

Project: azure-enterprise-scale-ml (ESML) AI Factory

The Enterprise Scale AI Factory is a plug and play solution that automates the provisioning, deployment, and management of AI projects on Azure with a template way of working.

Main purpose:

1) Marry multiple best practices & accelerators: It reuses multiple existing Microsoft accelerators/landingzone architecture and best practices such as CAF & WAF, and provides an end-2-end experience including Dev,Test, Prod environments.

Public links for more info

ESML AIFactory: The 2 project types

Tehnically, there are two IaC automated project types in the AIFactory: ESML, GenAI. Here they are seen connected to PERSONAS.

Personas is a tool the AIFactory uses to map tools, processes and people, to scale AI organizationally as well.
Personas is used to:

1) Find resource gaps, define responsibility, or find redesign needs: If you do not have people in your organization that fit a persona description needed to support a process step, you either need to redesign the architecture, change the process, or onboard new people with that persona. Personas is a good tool to define scope of responsibility 2) Education: Mapping personas to specific Azure services in the architecture provides the benefits of offering educational sessions and online courses to upskill within. 3) Security & Access: Personas mapped to processes, architectures and services can be used to define which services they need access to in a process. 4) Project planning & Interactions Personas mapped to each other can be used see which personas that primarily interacts with each other, to be used to setup sync meetings and project planning.

Read more about personas

ESML AIFactory: Enterprise Scale Landing Zones Context (VWan option)

The 2 project types, lives inside of the AIFactory landingzones.

Documentation:

The Documentation is organized around ROLES via Doc series.

Doc series Role Focus Details
10-19 CoreTeam Governance Setup of AI Factory. Governance. Infrastructure, networking. Permissions
20-29 CoreTeam Usage User onboarding & AI Factory usage. DataOps for the CoreTeam's data ingestion team
30-39 ProjectTeam Usage Dashboard, Available Tools & Services, DataOps, MLOps, Access options to the private AIFactory
40-49 All FAQ Various frequently asked questions. Please look here, before contacting an ESML AIFactory mentor.

It is also organized via the four components of the ESML AIFactory:

Component Role Doc series
1) Infra:AIFactory CoreTeam 10-19
2) Datalake template All 20-29,30-39
3) Templates for: DataOps, MLOps, *LLMOps All 20-29, 30-39
4) Accelerators: ESML SDK (Python, PySpark), RAG Chatbot, etc ProjectTeam 30-39

LINK to Documentation

Best practices implemented & benefits

NEWS TABLE

Date Category What Link
2024-03 Automation Add core team member 26-add-esml-coreteam-member.ps1
2024-03 Automation Add project member 26-add-esml-project-member.ps1
2024-03 Tutorial Core-team tutorial 10-AIFactory-infra-subscription-resourceproviders.md
2024-03 Tutorial End-user tutorial 01-jumphost-vm-bastion-access.md
2024-03 Tutorial End-user tutorial 03-use_cases-where_to_start.md
2024-02 Tutorial End-user installation Compute Instance R01-install-azureml-sdk-v1+v2.m
2024-02 Datalake - Onboarding Auto-ACL on PROJECT folder in lakel -
2023-03 Networking No Public IP: Virtual private cloud - updated networking rules https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-secure-workspace-vnet?view=azureml-api-1&preserve-view=true&tabs=required%2Cpe%2Ccli
2023-02 ESML Pipeline templates Azure Databricks: Training and Batch pipeline templates. 100% same support as AML pipeline templates (inner/outer loop MLOps) -
2022-08 ESML infra (IaC) Bicep now support yaml as well -
2022-10 ESML MLOps ESML MLOps v3 advanced mode, support for Spark steps ( Databricks notebooks / DatabrickStep ) -

BACKGROUND - How the accelerator started 2019

ESML stands for: Enterprise Scale ML.

This accelerator was born 2019 due to a need to accelerated DataOps and MLOps.

The accelerateor was then called ESML, We now only call this acceleration ESML, or project type=ESML, in the Entperise Scale AIFActory

THE Challenge 2019

Innovating with AI and Machine Learning, multiple voices expressed the need to have an Enterprise Scale AI & Machine Learning Platform with end-2-end turnkey DataOps and MLOps. Other requirements were to have an enterprise datalake design, able to share refined data across the organization, and high security and robustness: General available technology only, vNet support for pipelines & data with private endpoints. A secure platform, with a factory approach to build models.

Even if best practices exists, it can be time consuming and complex to setup such a AI Factory solution, and when designing an analytical solution a private solution without public internet is often desired since working with productional data from day one is common, e.g. already in the R&D phase. Cyber security around this is important.

THE Strategy 2019

To meet the requirements & challenge, multiple best practices needed to be married and implemented, such as: CAF/WAF, MLOps, Datalake design, AI Factory, Microsoft Intelligent Data Platform / Modern Data Architecture. An open source initiative could help all at once, this open-source accelerator Enterprise Scale ML(ESML) - to get an AI Factory on Azure

THE Solution 2019 - TEMPLATES & Accelerator

ESML provides an AI Factory quicker (within 4-40 hours), with 1-250 ESMLProjects, an ESML Project is a set of Azure services glued together securely.

IaC & MLOps TEMPLATES 2019: Templates for PIPELINES in project type ESML

The below is how it looked like, when ESML automated both the infrastructire, and generating Azure machine learning pipelines, with 3 lines of code.

TRAINING & INFERENCE pipeline templates types in ESML AIFactory that accelerates for the end-user.

Contributing to ESML AIFactory?

This repository is a push-only mirror. Ping Joakim Åström for contributions / ideas.

Since "mirror-only" design, Pull requests are not possible, except for ESML admins. See LICENCE file (open source, MIT license) Speaking of open source, contributors: