[RFC] OpenSearch Assistant Toolkit

Proposal

Introduction

OS Assistant is an integrated assistant powered by LLM to OpenSearch users that a user can interact with through chat, from integrated OpenSearch Dashboard pages, or APIs. The main goal is to lower the friction point for users to interact with the OpenSearch features and help guide them to perform the actions the users wanted to do. Additionally it should be able to aid the user in analyzing their data to gain further insights such as understanding more about the errors that are coming in from a log source. Users would be able to go to different pages such as the Alerting plugin page where it will have integration points to OS Assistant from the user’s alerts. These alerts integration points will allow the users to use OS Assistant to get more insights on those specific alerts, so it can aid them in their root cause analysis of those alerts. Also with the APIs, plugins can power up their features with LLM such as the Alerting plugin sending out summarized information about their alerts in the notifications it sends out to their users. Also the APIs will empower OpenSearch users to build their own chat application on top of OS Assistant for their customer’s use cases without having to build one from scratch.

Terminology/References:

Skills: These are new functionalities that the OS Assistant can perform for the user. One example is a skill to generate a PPL query based on the user’s natural language input.
Tools: Tools are lower level implementation of skills that interface with OpenSearch APIs to collect information to perform the tasks needed in the skill

Architecture/Design

Approach 1 [Recommended]: Use ML Commons and Flow Framework

Screenshot 2023-11-27 at 12 28 54 PM

Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
OS Assistant will call Flow Framework to do the provisioning with the template provided by the user
Flow Framework will call appropriate ML-Commons Model Serving Framework APIs to configure the model. This model is the one provided by the user to OS Assistant.
ML Commons will create the connector.
Flow Framework will call appropriate ML-Commons Agent Framework APIs to register each requested sub-agent and tools used for the sub-agent.
Agent Framework will return agent ID for each configured sub-agent.
Flow Framework will call ML Commons Agent Framework API to register the root agent, providing the sub-agent IDs for its nested Agent Tools
ML Commons will return the Root Agent ID to Flow Framework
Flow Framework will enable conversational memory in cluster settings (if not enabled), and then return the Root Agent ID to OS Assistant. This will be used for future user chat queries, which will go directly from OS Assistant to ML Commons. OS Assistant will manage conversations on the front end.

User sends a question to the OS Assistant
OS Assistant passes along the user question with the user’s chat history if there is one to the Agent Framework
The Agent Framework interfaces with the Model Serving Framework to access the model based on the user’s question
The Model Serving Framework accesses the model for the OS Assistant
Return model response back to the Agent Framework to inform how to use the tools
The Agent Framework access OpenSearch and plugins using the tools within the Agent Framework
OpenSearch and plugins will return back the data to the Agent Framework
Agent Framework collates the responses and return the answer to the user’s question to OS Assistant
Create or update the memory id for the user’s chat session updates based on the user’s question
Return back the response to the user’s question and provide traces of how the response was generated

Pros:

Allows the assistant to leverage other core OpenSearch building blocks to build an assistant, so this establishes the foundation for the OpenSearch community to build their own assistants
OS Assistant doesn’t need to manage the chat sessions as they are dealt by the Agent Framework
Not too much logic needs to stay within OS Assistant and lets ML Commons own the ML/AI components as that is its primary purpose
OS Assistant doesn’t need to deal with knowing how to provision the resources and that logic can be handled purely by Flow Framework

Cons:

Requires OpenSearch to build two new frameworks, Flow Framework and Agent Framework

Approach 2: Use ML Commons and not Flow Framework

Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
OS Assistant will call Model Serving Framework to establish the connection with the LLM model.
The Model Serving framework will create the connector to the LLM model.
The Model Serving framework will return the model id
OS Assistant will call appropriate ML-Commons Agent Framework APIs to register each requested sub-agent.
Agent Framework will return agent ID for each configured sub-agent.
OS Assistant will call ML Commons Agent Framework API to register the root agent, providing the sub-agent IDs for its nested Agent Tools
ML Commons will return the Root Agent ID to OS Assistant. OS Assistant will enable conversational memory in cluster settings (if not enabled). This will be used for future user chat queries. OS Assistant will manage conversations on the front end.

User sends a question to the OS Assistant
OS Assistant passes along the user question with the user’s chat history if there is one to the Agent Framework
The Agent Framework interfaces with the Model Serving Framework to access the model based on the user’s question
The Model Serving Framework accesses the model for the OS Assistant
Return model response back to the Agent Framework to inform how to use the tools
The Agent Framework access OpenSearch and plugins using the tools within the Agent Framework
OpenSearch and plugins will return back the data to the Agent Framework
Agent Framework collates the responses and return the answer to the user’s question to OS Assistant
Create or update the memory id for the user’s chat session updates based on the user’s question
Return back the response to the user’s question and provide traces of how the response was generated

Pros:

Allows the assistant to leverage other core OpenSearch building blocks to build an assistant, so this establishes the foundation for the OpenSearch community to build their own assistants
OS Assistant doesn’t need to manage the chat sessions as they are dealt by the Agent Framework
Not too much logic needs to stay within OS Assistant and lets ML Commons own the ML/AI components as that is its primary purpose

Cons:

Requires OpenSearch to build a new framework, Agent Framework
OS Assistant needs to deal with knowing how to provision the resources for it

Approach 3: Use Langchain and Model Serving Framework

Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
OS Assistant will call Model Serving Framework to establish the connection with the LLM model.
The Model Serving framework will create the connector to the LLM model.
The Model Serving framework will return the model id

Langchain (reference) is a framework to develop applications powered with LLMs and in this solution, we would implement the Langchain framework to build a set of tools for it and to interact with OpenSearch.

User sends a question to the OS Assistant
OS Assistant passes along the user question with the user’s chat history if there is one to Langchain
Langchain interfaces with the Model Serving Framework to access the model based on the user’s question
The Model Serving Framework accesses the model for the OS Assistant
Return model response back to Langchain to inform how to use the tools
Langchain accesses OpenSearch and plugins using the tools within Langchain
OpenSearch and plugins will return back the data to Langchain
Langchain collates the responses and return the answer to the user’s question to OS Assistant
Return back the response to the user’s question and provide traces of how the response was generated

Pros:

The OS Assistant will use Langchain framework which is one of the most popular opensource frameworks out currently.
OS Assistant has full control on how it will interface with the chat sessions and the different data in OpenSearch to answer the user’s question

Cons:

OS Assistant needs to deal with knowing how to provision the resources for it
OS Assistant will need to implement the Langchain framework to interface with the OpenSearch APIs and the LLM model
This design prevent others in the OpenSearch community from easily creating their own assistant and foundation build blocks are missing in OpenSearch and requires them to implement their own framework such as Langchain.
OS Assistant needs to now manage the chat sessions and ensure there are proper security guardrails around them

Workflows

Below are workflows based on the recommended design of what will happen during the cluster setup and the chat user flow.

Cluster setup image(3)

Chat User image(4)

Test Framework

Today, whenever there’s a change in prompt/model, we need to manually test all the previously tested questions and manually verify the responses. This process is tedious and prone to miss regressions. With this test framework we want to automate most of these steps and provide more objective ways to testing the LLM responses. This way, we can have a more test driven approach for any prompt/model changes. Overall the framework provides value to OS Assistant in the following ways:

Regression Tests for comparing prompt changes
Test and Compare different models with same prompts

Approach 1 [Recommended]

User initiate tests run, which triggers a Promptfoo process with all the prompt inputs and configurations. The test process calls OS Assistant APIs for testing LLM with combination of prompts and inputs. OS Assistant, internally uses ML commons APIs to make the actual call to LLM endpoints. Meanwhile, all the intermediate tool API calls are mocked with static responses by running OpenSearch Dashboards in test environment. Once we have LLM responses, they are tested against different type of assertions. The summary of the result in stored in the form of json files.

Components for the Test Framework:

PromptFoo
- Promptfoo helps to orchestrate different tests and spawning of the result server.
Configuration
- Config yaml file includes path to tests, input variables, scenarios, path to output and , path to custom API providers.
Prompt Input
- Prompt input includes prompt text for tests for Agents, LLM chains and security prompts. Most of these should be easy to generate with OS Assistant code-base.
Result Store
- Will store results in specific directory in form of json files per test run.
OS Assistant/ML Commons Connector
- This connector will provide a link between LLM APIs and promptfoo library runs.
Result Server
- This is component to host the UI with results in Promptfoo. After each run we will auto-trigger the server and keep it ready for the tester to see the results.

Pros:

Prompt generation is handled within the OS Assistant APIs and are abstracted to promptfoo.
Testing process is end to end and doesn’t need agent steps to be tested individually.

Cons:

Not enough information on which steps of agent is the OS Assistant response failing.

Approach 2:

In this approach, we only have one difference that the Promptfoo process creates prompt template inputs from OS Assistant codebase dynamically to tests intermediate steps in Agents. Also, here the test process directly calls ML commons API rather than OS Assistant.

Pros:

Detailed information on which steps of agent is the OS Assistant response failing.

Cons:

Need to build custom prompt template generators to support prompt generation of intermediate steps in agents.
The LLM chains testing also needs a prompt generation support to be built in OS Assistant.

Security

The assistant is just an interface to the LLM model and the Ml Commons agent tools. The assistant will run the OpenSearch APIs and access the data as the user since it will call the agent framework as the user interacting with the OpenSearch Assistant. This will prevent the user from getting escalate privileges and accessing data they do not have access to. Additionally the chat sessions are stored in the Agent Framework, so the assistant does not have to worry about managing the user’s data for them.

References

Flow Framework: https://github.com/opensearch-project/OpenSearch/issues/9213
Agent Framework: https://github.com/opensearch-project/ml-commons/issues/1161
- Tools are built for the Agent Framework to use to interface with OpenSearch and plugin APIs
- Tool Interface needed to implement a tool: https://github.com/opensearch-project/ml-commons/blob/feature/agent_framework_dev/spi/src/main/java/org/opensearch/ml/common/spi/tools/Tool.java
- Example tool: https://github.com/opensearch-project/ml-commons/pull/1629
Model Serving Framework: https://opensearch.org/docs/latest/ml-commons-plugin/ml-framework/
Langchain: https://python.langchain.com/docs/get_started/introduction

opensearch-project / dashboards-assistant