ylwu-amzn commented 1 year ago

In 2.9, ml-commons support remote inference with connector. This issue designs how to build a general Agent framework by leveraging remote LLM.

Why we need to build Agent?

To solve a complex problem, the process generally is hard to be predefined. We need to find some way to solve the problem step by step, identify potential solutions to reach a resolution.

Architecture

ml-commons-cot-arc (1)

Components

Model

ml-commons released remote inference feature in 2.9. We can create remote model with LLM connector. For example create a remote model with OpenAI Chat model.

Prompt Template

Define prompt template for LLM.

Prompt repo: generic prompt template
Prompt index: user’s custom prompt template

User can refer to the prompt template with prompt id in “prompt repo” or “prompt index”.

Agent

Agent is a coordinator which uses LLM to reason what action to take to solve problem, then coordinate action execution. The action execution sequence could be not predefined/hard-coded. We plan to make the framework flexible to support multiple Agent types (like flow, CoT etc). But for the first phase, we can start from supporting conversational ReAct Agent.

Tools

Tool is a function which can be executed by Agent. We will define a Tool framework in ml-commons. We will build some general built-in tools, for example

OpenSearchIndexTool which will gather OpenSearch index information like index status
VectorDBTool which will support running vector search
SeachIndexTool which support search OpenSearch index

The Tool framework will be extensible. Other plugins can build their own Tools by implementing Tool interface. For example AnomalyDetection plugin can build AnomalyResultTool which can query anomaly detection result and analyze, and build "AnomalyDetectorTool" to create anomaly detector. Some use cases to use these tool

User ask "How many anomalies detected in last 24 hours". Then Agent will use AnomalyResultTool to query anomaly result index about how many anomalies detected in last 24 hours.
User ask "I have one index: my_log_data, it will receive realtime log data, can you help create an anomaly detector and monitor the anomaly and send out alarm to my email xxx@yyy.com". Then Agent will use AnomalyDetectorTool to create anomaly detector for this index, then use AlertingMonitorTool create a anomaly monitor with detector and user's email.

ml-commons will be the central place to manage all of these tools. It will provide tool management functions:

List all tools
Get metadata of a specific tool
Enable/disable a tool
Custom tool: painless script

Possible design:

// Parser will parse some input string and do some transformation.
// For example, parse csv content into json format.
public interface Parser {
   String parse(String input);
}

public interface Tool {
    //Input parser will parse the input
    void setInputParser(Parser parser);
    //Output parser will parse tool's output
    void setOutputParser(Parser parser);

    // Run tool and return result.
    <T> T run(String input, Map<String, String> toolParameters);

    // Get tool name
    String getName();

    // *Get tool description.*
    String getDescription();

    String getExamples();

    /**
    * Validate if the input is good.
    */
    boolean validate(String input, Map<String, String> toolParameters);

    /***
    ** Check if should end the whole CoT immediately.*
    ** For example, if some critical error detected like high memory pressure,*
    ** the tool may end the whole CoT process by returning true.*
    **/
    default boolean end(String input, Map<String, String> toolParameters){return false;}
}

Tool could be very general, for example

SearchPipelineTool: run a search pipeline
PainlessScriptTool: execute painless script.
VectorDBTool: search question in vector DB and return top N document.
ReRankTool: wrap a remote ReRank model and do the ReRank work.
PPLQueryGenerateTool: use some query generation model to translate natural language question to PPL query. Similarly, we can also build SQLQueryGenerateTool, DSLQueryGenerateTool

User can provide parameters to customize their own tools. For example for VectorDBTool, user can customize the embedding_model_id, knn_idnex etc.

"VectorDBTool": {
    "embedding_model_id": "<mode_id>",
    "idnex": "product_index", # knn index
    "size": 5
}

Toolkit

A set of tools which can work together to complete some target task. For example, Anomaly Detection (AD) plugin can build multiple tools like AnomalyResultTool, AnomalyDetectorTool, HistoricalAnalysisTool etc. Then AD can create a Tookit AnomalyToolkit for all of these anomaly related tools. So user can just configure AnomalyToolkit in Agent, will automatically add all of these AD tools to Agent.

Memory

A general memory layer for storing the historical interactions. For example, the chat/conversational agent needs to save session history and continue the session later.

Design

Option1: add new concept: Agent

User needs to create a remote model for LLM first. Then user can use the model id and tools to create a new Agent. The workflow will be:

ml-commons1

POST /_plugins/_ml/agents/_register
{
   "name": "product recommendation agent",
   "description": "this is a test agent",
   "model_id": "<llm_model_id>",
   "tools": {
        "VectorDBTool": {
            "embedding_model_id": "<mode_id>",
            "idnex": "my_products"
            "size": 5
        },
        "ReRankTool": {
            "model_id": "<rerank_model_id>"
        }
   },
   "parameters": {
        "prompt": "...",
        "key": "value"
        ...
   },
   "memory":  {
        type: "",
        // different memory implementations
   } 
}

We will provide get, search and update Agent APIs.

User don’t need to deploy Agent, they can run an Agent directly if the model is deployed.

POST /_plugins/_ml/agents/<agent_id>/_execute
{
   "parameters": {
        "prompt": "...",
        "question": "..."
        ...
   }
}

Pros:

Clean solution. Separate model and Agent. Model will keep as is, Agent will do the coordinating/orchestration work.
Easier to extend. If need to support more Agent types, no impact to model, just need tune Agent part.

Cons:

More effort: need to build new agent index; build a set of APIs for agent
ml-commons client need to integrate with new execute agent API.

Two options for agent access control:

Option1: Add new access control on Agent level similar to model access control.
Option2: As Agent always need a model id, we can leverage model access control for Agent. If user has access to a model, then user has access to Agents using that model.

Option2: Extend model by adding agent field

We extend current model by adding Tools or Agent fields. Such model will be a CoT model. A CoT model must be some LLM which supports generative AI, for example OpenAI ChatGPT model, Anthropic Claude.

The workflow will be

ml-commons2

POST /_plugins/_ml/models/_register
{
  "name": "OpenAI Chat Model",
  "function_name": "remote",
  "model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
  "description": "test model",
  "connector": { # remote LLM
    "name": "OpenAI Chat Connector",
    "description": "The connector to public OpenAI model service for GPT 3.5",
    "version": 1,
    "protocol": "http",
    "parameters": {
      "endpoint": "api.openai.com",
      "model": "gpt-3.5-turbo"
    },
    "credential": {
      "openAI_key": "..."
    },
    "actions": [
      {
        "action_type": "predict",
        "method": "POST",
        "url": "https://${parameters.endpoint}/v1/chat/completions",
        "headers": {
          "Authorization": "Bearer ${credential.openAI_key}"
        },
        "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages} }"
      }
    ]
  },
  "agent": {
      "type": "conversational",
      "tools": {
        "MathTool": {
            "n": 10
        }, 
        "SearchIndexTool": {
            "size": 5
        }
      }
  }
}

Pros:

Less effort. No need to add new agent index, no need to build new APIs.
ml-commons client doesn’t impact, they can still use current predict model API.

Cons:

If we need to build new Agent type or change Agent part, the model will be also impacted.

HenryL27 commented 1 year ago

1161 and #1150 represent rather different philosophies in terms of integrating GenAI into OpenSearch. #1161 seeks to provide a framework for building conversational apps that happen to use some OpenSearch features. #1150 seeks to provide a conversational interface over your favorite search engine. Both are valid. Neither should be core OpenSearch features, and furthermore, I think that neither belong in ML-Commons. ML-Commons is for training and predicting with ML models, while these RFCs are for building Generative AI applications.

Accordingly, I’d like to call for the creation of an ‘AI-Commons’ plugin as an extension to ML-Commons. #1150 and #1161 will look pretty similar, code-wise, so I imagine it should be pretty easy to share a codebase. Both need conversational memory of some form; both need prompt templating of some form.

Why do we want both? I imagine developers picking AI-Commons up for the RAG of #1150 - this will provide a good starting point for people looking to spice up their existing search application with some GenAI pizzazz. In many use-cases, this will be sufficient. But gradually, these conversational search apps will acquire peculiarities and requirements that the RAG pipeline might not support. Then #1161’s CoT will be required, and these apps will cross a line where they stop being fancy search apps and start being fancy AI apps. Therefore it should be easy to go from RAG to CoT - RAG should be in the CoT ecosystem, but should also be able to stand alone.

As an example, what does answering the question “What happens if a ship jackknifes in the Suez Canal?” entail? RAG will try to answer in a single query (granted, with some potentially clever query rewriting), but unless there’s a document detailing the answer to this question, RAG is hopeless. CoT, however, will ask a series of queries, one step at a time, to build up and derive an answer. For example - “What are the major trade routes through the Suez Canal?”, “What are the shipping routes from Oman to America?”, “How long are they?”, “What products does this particular ship carry?”, “What is the demand and backlog of this particular product?”, etc.

Great! Well, if CoT is so powerful, why are we bothering with RAG? A couple of reasons. 1/ RAG is much simpler. Personally, I prefer using predictable tools that I understand. I know exactly what RAG is going to do: query OpenSearch, then pipe the results into an LLM for synthesis. I don’t know what a CoT agent is going to do, given an arbitrary query. That’s what makes it so powerful - that’s gets to choose how to answer - but I don’t quite trust it to do the right thing. And trust is everything when it comes to GenAI adoption. So if we let RAG build up trust in the system, then people will be more comfortable switching to CoT. 2/ RAG is closer to search. OpenSearch users want to do search. Throwing a whole CoT GenAI infrastructure at someone who just wants to do search is going to alienate them. But a GenAI interface (RAG) over their search, maybe that will be easier to stomach. Finally, 3/ RAG is probably cheaper, cost- and performance-wise - only one or two LLM inferences instead of several for every query.

So both #1150 and #1161 should happen, imo, and in the same place. How can we combine our efforts? The absolute first thing is that we all need to be aware of what code already exists. I’ve published some code, and I would urge everyone else here to do the same. We don’t all work together, so if we want to work together, we need to be able to see each other’s work. As far as integrating the RFCs into one project - first I’m gonna vote to separate agents from models (the first option from #1161). Then my proposed plan:

1150 will proceed as planned - conversational memory, RAG pipelines and API - in AI-Commons (or whatever we call the conversational AI plugin)
Everyone seems to agree on prompt templates; some implementation of that will exists.
1161 CoT will continue as planned, also in AI-Commons, with a RAG tool that uses the RAG pipeline

In general with CoT, I think we don’t want to give the LLM too many options. We should try to keep as much complexity within the tools as possible, and focus on giving them clean and intuitive interfaces - LLMs are basically pure intuition.

I hope this plan is agreeable to people.

p.s. can we resolve #1151? It looks like #1161 and #1150 partition it.

ylwu-amzn commented 1 year ago

Thanks @HenryL27 for the detailed analysis and explanation. We are going to resolve #1151.

So we have two options for how to organize the code: Option1: use current ml-commons Option2: creating a new AI-Commons

Both have some pros and cons. For #1161, we could leverage current ml-commons ML framework and it will be less effort for both ml-commons and ml-commons clients(they don't need to move to a new AI-Commons). And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer.

I think for #1150 , its scope is clear and well defined. Similar to #1151, I think people all agree a conversation plugin is necessary and the conversational search is important feature. I would like to keep the scope as is. That could make the discussion and design clean and easier. Agree to your analysis for "if CoT is so powerful, why are we bothering with RAG" part. CoT is more general, but I totally agree RAG is necessary. Let's don't combine these two things together too early. That will make the scope much bigger and complicated. Let's discuss the well-scoped RFC separately first, once these separate RFC are aligned, we can continue next step about check if it's possible to leverage some common thing for example add another option: build RAG Agent. But I'm also fine if you think we should list all these possible options and discuss them together first. But please list all options clearly and simply (make the scope clear), and analyze the pros and cons.

HenryL27 commented 1 year ago

Okay, here are some options I've come up with. Note that pros and cons are subjective, so I'm interested in your opinions regarding these. Also, this is by no means a comprehensive list, and I'll bet a lot of the listed pros and cons apply to other options as well.

Option 1: AI-Commons Pros:

Shared codebase - more integratable, a lot of components can be used in common
=> easier for users to switch between RAG and CoT apps while keeping the same settings and stuff
All conversational features live in one place
Avoids ml-commons bloat

Cons:

Lose a lot of ml-commons framework for CoT implementation
Forces agents and models to be separate entities (I think this is the natural implementation, but good to have options)
Potentially bloated AI-commons with multiple scopes
Awkward scenario with RAG being its own thing but also usable by CoT
'Artificial' division between AI and ML

Option 2: Separate RAG; CoT in ML-Commons Pros:

easier CoT implementation taking advantage of ML-Commons scaffolding
RAG and CoT implementations can defer thinking about each other
Scopes are as defined in the RFCs; don't need to worry about other AI projects popping up that an AI-Commons plugin would have to support

Cons:

RAG users will have trouble moving their settings to CoT when they need to make more complicates apps
When (if) we want to integrate these it'll be that much harder
ML-Commons supports a bunch of AI stuff now that maybe not everyone wants (I guess you don't have to use it, but still).

Option 3: Separate RAG, Separate CoT Pros:

Scope is clear. 2 RFCs, 2 plugins.

Cons:

Harder CoT implementation.
Maybe unnecessarily modular
Harder integrations on all fronts

Option 4: Everything in ML-Commons Pros:

Easy integrations
User experience requires the least change

Cons:

A lot of stuff ends up in ML-Commons that has little to do with ML: Chat history, tools management, prompt templating. I realize that putting all this stuff into ML-Commons is your long-term plan, but I'd be interested in why you've planned to go down this route.

Option 5: Separate CoT, RAG in ML-Commons Pros:

Kinda funny to imagine, given everything

jonfritz commented 1 year ago

@ylwu-amzn "And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer."

Can you pls share where this roadmap and the RFC/doc outlining the strategy behind it? Would also love to see the community discussion around the pros/cons, and if not, perhaps now would be a good time for folks to weigh in.

ylwu-amzn commented 1 year ago

Sorry that our doc is not so up to date, we need to fine tune the readme doc to reflect the latest change like remote model. But you can see "Machine Learning Commons for OpenSearch is a new solution that make it easy to develop new machine learning feature." in our current readme. We meant to use ml-commons as a common layer to make it easy for building any ML (also AI actually) application/feature.

neural-search plugin(doc link) is a good example for this. It depends on ml-commons client to build semantic search feature.

ylwu-amzn commented 1 year ago

@HenryL27 Thanks for the quick response. I think the main point is how to organize the code, different way will be a new option. Like you mentioned, separate RAG/COT , or build in ml-commons or new AI-commons.

Like I replied in last comment, ml-commons will be a common layer for ML/AI things which provides frameworks/easy-to-use APIs like train/predict , resource management like managing models, routes requests etc. This can reduce the fragmentation. Reduce the common framework thing into another "commons" repo will make it harder to maintain. You can check neural-search plugin, it's a ML/AI feature built on top of ml-commons, just add ml-commons jar client and call predict API. I would suggest use the similar way to build RAG feature: create a new RAG(or any other better name) repo, then leverage ml-commons java client jar to invoke model or Agent. Well decoupled also keep the common thing like ML framework and Agent things in one common place.

Edit: another option is adding RAG feature to neural-search plugin

dtaivpp commented 1 year ago

Haven't gone through the whole thread yet but wanted to drop this in for discussion/consideration. Haystack has a PromptHub framework for fetching, updating, and using prompts.

https://haystack.deepset.ai/blog/share-and-use-prompt-with-prompthub

ylwu-amzn commented 1 year ago

@dtaivpp Thanks for sharing this, I think we should consider integrating with haystack or other prmopthub.

HenryL27 commented 11 months ago

Hey @ylwu-amzn, it's been a bit since I've seen anything reference this issue. I'm wondering where we are in the development required to get the framework implemented. Is there anything we can do to help?

ylwu-amzn commented 11 months ago

hi, @HenryL27 , We already have one PoC. Will publish soon.

dtaivpp commented 9 months ago

On the note of conversational memory; it was brought up in today's community meeting that we should have hooks for ISM that allow the chat history to be deleted after a certain period of time. At the moment I believe we only support deleting indexes based on their creation time but for this depending on how the conversational memory is implemented will need to delete individual documents based on insert time.

dhrubo-os commented 5 months ago

Closing this issue as Agent framework is delivered as GA in 2.13.0

opensearch-project / ml-commons

[RFC] Agent framework #1161

Why we need to build Agent?

Architecture

Components

Model

Prompt Template

Agent

Tools

Toolkit

Memory

Design

Option1: add new concept: Agent

Option2: Extend model by adding agent field

1150 will proceed as planned - conversational memory, RAG pipelines and API - in AI-Commons (or whatever we call the conversational AI plugin)

1161 CoT will continue as planned, also in AI-Commons, with a RAG tool that uses the RAG pipeline