opensearch-project / dashboards-assistant

Dashboard assistant is a way for users on OpenSearch Dashboards to interact with an assistant through chat or the different OSD pages
https://opensearch.org/
Apache License 2.0
17 stars 22 forks source link

[RFC] OpenSearch Assistant Toolkit #18

Open lezzago opened 9 months ago

lezzago commented 9 months ago

Proposal

Introduction

OS Assistant is an integrated assistant powered by LLM to OpenSearch users that a user can interact with through chat, from integrated OpenSearch Dashboard pages, or APIs. The main goal is to lower the friction point for users to interact with the OpenSearch features and help guide them to perform the actions the users wanted to do. Additionally it should be able to aid the user in analyzing their data to gain further insights such as understanding more about the errors that are coming in from a log source. Users would be able to go to different pages such as the Alerting plugin page where it will have integration points to OS Assistant from the user’s alerts. These alerts integration points will allow the users to use OS Assistant to get more insights on those specific alerts, so it can aid them in their root cause analysis of those alerts. Also with the APIs, plugins can power up their features with LLM such as the Alerting plugin sending out summarized information about their alerts in the notifications it sends out to their users. Also the APIs will empower OpenSearch users to build their own chat application on top of OS Assistant for their customer’s use cases without having to build one from scratch.

Terminology/References:

Architecture/Design

Approach 1 [Recommended]: Use ML Commons and Flow Framework

Screenshot 2023-11-27 at 12 28 54 PM

  1. Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
  2. OS Assistant will call Flow Framework to do the provisioning with the template provided by the user
  3. Flow Framework will call appropriate ML-Commons Model Serving Framework APIs to configure the model. This model is the one provided by the user to OS Assistant.
  4. ML Commons will create the connector.
  5. Flow Framework will call appropriate ML-Commons Agent Framework APIs to register each requested sub-agent and tools used for the sub-agent.
  6. Agent Framework will return agent ID for each configured sub-agent.
  7. Flow Framework will call ML Commons Agent Framework API to register the root agent, providing the sub-agent IDs for its nested Agent Tools
  8. ML Commons will return the Root Agent ID to Flow Framework
  9. Flow Framework will enable conversational memory in cluster settings (if not enabled), and then return the Root Agent ID to OS Assistant. This will be used for future user chat queries, which will go directly from OS Assistant to ML Commons. OS Assistant will manage conversations on the front end.
Screenshot 2023-11-28 at 5 17 07 PM
  1. User sends a question to the OS Assistant
  2. OS Assistant passes along the user question with the user’s chat history if there is one to the Agent Framework
  3. The Agent Framework interfaces with the Model Serving Framework to access the model based on the user’s question
  4. The Model Serving Framework accesses the model for the OS Assistant
  5. Return model response back to the Agent Framework to inform how to use the tools
  6. The Agent Framework access OpenSearch and plugins using the tools within the Agent Framework
  7. OpenSearch and plugins will return back the data to the Agent Framework
  8. Agent Framework collates the responses and return the answer to the user’s question to OS Assistant
  9. Create or update the memory id for the user’s chat session updates based on the user’s question
  10. Return back the response to the user’s question and provide traces of how the response was generated
Pros:
Cons:

Approach 2: Use ML Commons and not Flow Framework

Screenshot 2023-11-28 at 5 00 23 PM
  1. Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
  2. OS Assistant will call Model Serving Framework to establish the connection with the LLM model.
  3. The Model Serving framework will create the connector to the LLM model.
  4. The Model Serving framework will return the model id
  5. OS Assistant will call appropriate ML-Commons Agent Framework APIs to register each requested sub-agent.
  6. Agent Framework will return agent ID for each configured sub-agent.
  7. OS Assistant will call ML Commons Agent Framework API to register the root agent, providing the sub-agent IDs for its nested Agent Tools
  8. ML Commons will return the Root Agent ID to OS Assistant. OS Assistant will enable conversational memory in cluster settings (if not enabled). This will be used for future user chat queries. OS Assistant will manage conversations on the front end.
Screenshot 2023-11-28 at 5 17 07 PM(1)
  1. User sends a question to the OS Assistant
  2. OS Assistant passes along the user question with the user’s chat history if there is one to the Agent Framework
  3. The Agent Framework interfaces with the Model Serving Framework to access the model based on the user’s question
  4. The Model Serving Framework accesses the model for the OS Assistant
  5. Return model response back to the Agent Framework to inform how to use the tools
  6. The Agent Framework access OpenSearch and plugins using the tools within the Agent Framework
  7. OpenSearch and plugins will return back the data to the Agent Framework
  8. Agent Framework collates the responses and return the answer to the user’s question to OS Assistant
  9. Create or update the memory id for the user’s chat session updates based on the user’s question
  10. Return back the response to the user’s question and provide traces of how the response was generated
Pros:
Cons:

Approach 3: Use Langchain and Model Serving Framework

Screenshot 2023-11-29 at 6 44 55 AM
  1. Template is provided to OS Assistant for initializing the Assistant. The cluster super admin will normally execute this setup (it is also possible for another user with sufficient roles/permissions to do this and is open to debate).
  2. OS Assistant will call Model Serving Framework to establish the connection with the LLM model.
  3. The Model Serving framework will create the connector to the LLM model.
  4. The Model Serving framework will return the model id
Screenshot 2023-11-28 at 5 17 39 PM

Langchain (reference) is a framework to develop applications powered with LLMs and in this solution, we would implement the Langchain framework to build a set of tools for it and to interact with OpenSearch.

  1. User sends a question to the OS Assistant
  2. OS Assistant passes along the user question with the user’s chat history if there is one to Langchain
  3. Langchain interfaces with the Model Serving Framework to access the model based on the user’s question
  4. The Model Serving Framework accesses the model for the OS Assistant
  5. Return model response back to Langchain to inform how to use the tools
  6. Langchain accesses OpenSearch and plugins using the tools within Langchain
  7. OpenSearch and plugins will return back the data to Langchain
  8. Langchain collates the responses and return the answer to the user’s question to OS Assistant
  9. Return back the response to the user’s question and provide traces of how the response was generated
Pros:
Cons:

Workflows

Below are workflows based on the recommended design of what will happen during the cluster setup and the chat user flow.

Cluster setup image(3)

Chat User image(4)

Test Framework

Today, whenever there’s a change in prompt/model, we need to manually test all the previously tested questions and manually verify the responses. This process is tedious and prone to miss regressions. With this test framework we want to automate most of these steps and provide more objective ways to testing the LLM responses. This way, we can have a more test driven approach for any prompt/model changes. Overall the framework provides value to OS Assistant in the following ways:

  1. Regression Tests for comparing prompt changes
  2. Test and Compare different models with same prompts

Approach 1 [Recommended]

Screenshot 2023-11-28 at 7 53 54 AM

User initiate tests run, which triggers a Promptfoo process with all the prompt inputs and configurations. The test process calls OS Assistant APIs for testing LLM with combination of prompts and inputs. OS Assistant, internally uses ML commons APIs to make the actual call to LLM endpoints. Meanwhile, all the intermediate tool API calls are mocked with static responses by running OpenSearch Dashboards in test environment. Once we have LLM responses, they are tested against different type of assertions. The summary of the result in stored in the form of json files.

Components for the Test Framework:

Pros:
Cons:

Approach 2:

Screenshot 2023-11-28 at 7 54 33 AM

In this approach, we only have one difference that the Promptfoo process creates prompt template inputs from OS Assistant codebase dynamically to tests intermediate steps in Agents. Also, here the test process directly calls ML commons API rather than OS Assistant.

Pros:
Cons:

Security

The assistant is just an interface to the LLM model and the Ml Commons agent tools. The assistant will run the OpenSearch APIs and access the data as the user since it will call the agent framework as the user interacting with the OpenSearch Assistant. This will prevent the user from getting escalate privileges and accessing data they do not have access to. Additionally the chat sessions are stored in the Agent Framework, so the assistant does not have to worry about managing the user’s data for them.

References

hdhalter commented 6 months ago

Hi @lezzago , are there any documentation implications for 2.13? Thanks.