opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
84 stars 119 forks source link

[Discussion] ml-commons tenet and architecture #1205

Open ylwu-amzn opened 10 months ago

ylwu-amzn commented 10 months ago

This issue is to discuss the ml-commons tenet/goal, architecture and the underlying principles of design rationale.

Introduction

This repository aims to provide a collection of essential frameworks, tools and APIs to streamline the development, deployment, and management of ML/AI applications. Whether you're a data scientist, developer, or researcher, this repository offers a unified platform to build, deploy, and connect ML/AI models effortlessly.

Goals

Our primary goals for this repository are:

  1. Standardization: Provide a standardized set of tools and APIs that can be reused across various ML/AI projects, promoting consistency and reducing development effort.
  2. Simplicity: Abstract complex processes, such as model training, deployment, and service connection, into simple and intuitive APIs, enabling rapid development.
  3. Flexibility: Support a wide range of ML/AI use cases and frameworks, allowing users to adapt the tools/APIs to their specific requirements.

Architecture

ml-commons-arch

The repository is structured around the following key components:

1. General Rest APIs

A comprehensive set of APIs that encompass common functionalities:

2. General Frameworks for ML/AI

3. Client for Vertical ML Features

A client library that empowers developers to build vertical ML features and applications. For instance, plugins like neural-search utilize the ml-commons client to integrate semantic search capabilities.

Design questions

What should be in ml-commons?

  1. General framework which is not built for a dedicated/specific vertical area. For example Agent framework which provides general agent/tool/memory interface and management APIs, this is not for a dedicated area like GenAI. It's a general framework, any user can build their own Agent, Tool and Memory to build their own vertical features.

What should not be in ml-commons?

  1. Vertical ML applications/features. For example neural search feature which focus only on neural search area; PPL ML command which is a special PPL command supporting running ML models.
jonfritz commented 10 months ago

Thanks for putting this together! I think this is a good start, and I agree that ml-commons should be a standard set of tools that support various, more specific AI/ML use cases. Model training, deployment, and service connection are common across many ML use cases, and make sense at a "commons" layer. I think we need to more crisply describe what a general framework vs. a vertical framework is, because I think you can make a case that some frameworks that support a more narrow use case are general (and belong in "commons"), or vice versa.

Although we are starting with conversational search, memory, and agents in ml-commons (though as pointed out in other conversations, this is not a one way door, and all of this may end up in an AI-commons someday). I'm not sure the way we are addressing it here follows. For instance, I'm not sure what agent use cases are not tied to generative AI or LLMs. Instead, I might approach it from an angle that agents are a way to interact with AI models, similarly to how inference requires a way to interact with a model. If we're supplying the lowest level components to build agents, that could fit the story more clearly.