[FEATURE] Memory Enhancement to Support more LLM applications

Zhangxunmt commented 10 months ago

In the Agent Framework, we want to enhance Ml-Commons to be able to interact with all kinds of LLMs like Claude, BedRock, etc. The use case of the agent framework will not be limited in the Search scenario, but also in Chatbot, Forecasting etc. To this end, the memory component needs to be extended and refactored to support more applications as a new data layer in Ml-Commons between the public APIs and system indices. This document focus on the design of this memory layer that supports Agent Framework in Ml-Commons.

Architecture

A memory system needs to support two basic actions: reading and writing. Recall that every agent defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. An agent will interact with its memory system twice in a given run.

AFTER receiving the initial user inputs but BEFORE executing the core logic, an agent will READ from its memory system and augment the user inputs.
AFTER executing the core logic but BEFORE returning the answer, an agent will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.

As outlined in the high level Design, CRUD-like APIs will be added to the Memory system for two resources: Conversation and Interaction. Conversations are made up of interactions. An interaction represents a pair of messages: a human input and an artificial intelligence (AI) response. These restful APIs already exists in the conversational search. To support Agent Framework, the mapping and schema of the Conversation and Interaction will be updated with backward compatibility.

The sequential flow of these APIs can be summarized as below. We plan to support three types of Query Algorithms.

(P0) Most Recent Top N - store all interactions and return the most recent of them. The number of interactions returned is configurable.
(P2) Summary - generate a summary of the most recent interactions. The user can configure the number of interactions, -1 means all history messages.
(P2) VectorSearch - return the most relevant interactions by neural search.

memory

System Index and Mappings

We will reuse the two system index that were created in the conversational search. The index “conversation-meta” stores the conversation metadata and the “conversation-interactions” stores every interaction between the user inputs and the LLM response. To support more applications, a new field “application_type” is added in the “conversation-meta” mapping to distinguish conversations from different applications. For example, a chatbot calls Fractal/Agent to create new conversations and the chatbot agent will write “chatbot” for the “application_type” field in each conversation. The conversations created in Conversational Search have the “application_type” as null/empty since their APIs do not include this new field. When ingesting new interactions into a conversation, Ml-Commons needs to make sure the chatbot interactions are referenced only to chatbot conversation, and pipeline interactions are only referenced to pipeline conversation, etc.

In the “conversation-interactions” index, the new fields are mostly flat object that are general enough to be easily fit into use cases more than just chatbot. The new mappings of these two system index are listed below with new added fields shadowed.

conversation-meta

.plugins-ml-conversation-meta
{
    "_meta": {
        "schema_version": 1
    },
    "properties": {
        "name": {"type": "keyword"},
        "create_time": {"type": "date", "format": "strict_date_time||epoch_millis"},
        "user": {"type": "keyword"},
        "application_type": "Chatbot/<other type from agent>"
    }
}

conversation-interactions

.plugins-ml-conversation-interactions
{
    "_meta": {
        "schema_version": 1
    },
    "properties": {
        "conversation_id": {"type": "keyword"},
        "create_time": {"type": "date", "format": "strict_date_time||epoch_millis"},
        "input": {"type": "text"},
        "prompt_template": {"type": "text"},
        "response": {"type": "text"},
        "origin": {"type": "keyword"},
        "additional_info": {"type": "flat_object"},
        "parent_interaction_id": {"type": "keyword"},
        "trace_number": {"type": "long"}
    }
}

New APIs:

Update Interactions: (needs revisit)

Chatbot needs to use Update Interaction API to add/update contents for interactions, including adds new fields like “notes” and “post_process_response”, etc.

PUT /_plugins/_ml/memory/<memory_id>/<interaction_id>
{
    "input": "How do I make an interaction?",
    "prompt_template": "Hello OpenAI, can you answer this question? \
                                                Here's some extra info that may help. \
                                                [INFO] \n [QUESTION]",
    "response": "Hello, this is OpenAI. Here is the answer to your question.",
    "origin": "MyFirstOpenAIWrapper",
    "additional_info": {"Additional text related to the answer A JSON or other semi-structured response" , "suggestion" : { ... }, "reference": {...} , "post_process_response": {}}
}

Update Conversations:

This is to allow users to update the name of the conversation.

PUT /_plugins/_ml/memory/<memory_id>
{
   "name": "new conversation name",
   "description": "this is a memory for chatbot" 
}

HenryL27 commented 10 months ago

Thanks @Zhangxunmt! I have a couple questions:

for interaction-level vector search, is the plan to turn the interactions index into a knn index? What embedding model will you use? I guess to perform the search itself you'll use the apis introduced in #1504 ?
The interaction-level "origin" field represents almost the same thing as the new "application_type" field (or is meant to). Maybe we can have the names agree with each other to make that more clear? (e.g. "application_type" -> "origin_type")
The "additional_info" field is meant as a catch-all for other application-specific information. Is it feasible to pack and unpack "trace_number", "references", and "post_process_response" into a single string? I guess if you need to search over those fields, then maybe not. btw, what does the trace number do?
Let's follow the endpoint naming conventions from #1268 (implemented in the above PR) and use PUT /_plugins/_ml/memory/conversation/{conversation_id}/_update and PUT /_plugins/_ml/memory/conversation/{conversation_id}/{interaction_id}/_update
I also worry a little about allowing arbitrary field additions via update? It's probably fine

austintlee commented 10 months ago

For the update API, I think certain parts of an interaction should be immutable, e.g. user input and LLM response. I don't know if versioning interactions is the way to go, but we should think about the immutability aspect.

Also, how important is it to support role-based access control for conversations and interactions? Is that going to be a blocker for this work?

ylwu-amzn commented 10 months ago

Also, how important is it to support role-based access control for conversations and interactions? Is that going to be a blocker for this work?

I think we don't have strong requriements for now for role-based access control. We can always add it in future, not one way door.

navneet1v commented 10 months ago

@austintlee , @HenryL27 , @ylwu-amzn , @Zhangxunmt

Given the index name is very much tied with conversation use case. I was thinking to strip the conversation from the index name to make this index available for other use cases which require memory. With that change this can become a pure memory layer for any kind of ML use case.

Please let me know your thoughts.

Zhangxunmt commented 10 months ago

How about renaming the index to the following names?

plugins-ml-conversation-meta -> plugins-ml-memory-meta plugins-ml-conversation-interactions -> plugins-ml-memory-message

This change will be a breaking change. Anyone that has created conversations will lost all data after the name change. Are you all agree? @austintlee @HenryL27 @navneet1v

navneet1v commented 10 months ago

This change will be a breaking change. Anyone that has created conversations will lost all data after the name change. Are you all agree?

I agree, given that feature was in preview. I think we discussed this in the last ML call.

For users who are already using can we provide a way in which after upgrade their old data can be migrated to new indexes.

opensearch-project / ml-commons