austintlee commented 1 year ago

Introduction

The recent advances in Large Language Models (LLMs) have enabled developers to utilize natural language in their applications with better quality and ability. As ChatGPT has shown, these LLMs strongly enable use cases involving summarization and conversation. However, when prompting LLMs to answer fact-based questions (applications we call “conversational search”), we find that there are significant shortcomings for enterprise-grade applications.

First, the major LLMs are not trained on datasets that are not exposed to the internet, and therefore do not have the context to answer questions on private data. Most enterprise data falls into this category. Second, the way in which LLMs answer questions based on their training data gives rise to “hallucinations” and false answers, which are not acceptable in applications for mission critical use cases.

End-users love the ability to converse using colloquial language with an application to get answers to questions or find interesting search results, but require up-to-date information and accuracy. A solution to this problem is through Retrieval Augmented Generation (RAG), where an application sends an LLM a superset of correct information in response to a prompt, and the LLM is used to summarize and extract information from this set (instead of probabilistically determining an answer).

We believe OpenSearch could be a great platform for building conversational search applications, and aligns well with the RAG approach. It already offers semantic search capabilities using its vector database and k-NN plug-in, alongside enterprise-grade security and scalability. This is a great building block for the “source of truth” information retrieval component of RAG. However, it currently lacks the primitives and crisp APIs to easily enable the conversational element.

Although there are libraries that allow for building this functionality at the application layer (e.g. LangChain), we believe the best developer experience would be to enable this directly in OpenSearch. We consider the “G” in a RAG pipeline as LLM-based post-processing to enable direct question answering, summarization, and a conversational experience on top of OpenSearch semantic search. This enables end-users to interact with their data in OpenSearch in new ways. Furthermore, we believe developers may want to use different LLMs, and that the choice of model should be pluggable.

Through using plugins and search pipelines, we propose an architecture in this RFC to expose easily consumable APIs for conversational search, history, and storage. We segment it into a few components, including: 1/search query rewriting using generative AI and conversational context, 2/question answering and summarization of OpenSearch semantic search queries using generative AI, and 3/a concept of “conversational memory” to easily store the state of conversations and add additional interactions. Conversational Memory will also support conversational applications that have multiple agents operating together, giving a single source of truth for conversation state.

Goals

1/ Developers can easily build conversational search applications (e.g. knowledge-base search, informational chatbot, etc.) using OpenSearch and their choice of generative AI model using well-defined REST APIs. Some of these applications will be an ongoing conversation, while others will be one-shot (and the history of interactions is not important).

2/ Developers can use OpenSearch to support multi-agent conversational architectures, which require a single “source of truth” for conversational history. Multi-agent architectures will have other agents besides that for semantic search with OpenSearch (e.g. an agent that queries the public internet). These developers need an easy API to manage conversational history, both in adding interactions to conversations and exploring history of those conversations.

3/ Developers can easily obtain OpenSearch (semantic) search results alongside the generative AI question answering, so they can show the source documents and enable the end user to explore the source material.

Non-Goals

1/ Building a general LLM application toolkit in OpenSearch. Our goal is just to enable conversational search and the related dependency of conversational memory.

2/ LLM hosting. LLMs take significant resources and should be operated outside of an OpenSearch cluster. We also hope to use the ML-Commons remote inference feature rather than implement our own connectors.

3/ A conversational search application platform. Our goal is to expose crisp APIs to make building applications that use conversational search easy, but not create the end application itself.

Proposed Architecture

An addition to the ml-commons plugin that provides a CRUD API to store and access conversation history (”memory”).
An addition of a new Search Pipeline implementation to ml-commons that uses conversational memory and large language models for question answering.
An addition to ml-commons that enables users to have conversations through a new Conversation API.

![Aryn Conversation Plugins v2](https://github.com/opensearch-project/ml-commons/assets/1363802/a171b6aa-76ce-4c17-b874-6eeb244d6b21) ### Conversational Memory API (Chat History) Conversational memory is the storage for conversations, which are an ordered list of interactions. Conversational memory makes it easy to add new interactions to a conversation or explore previous interactions. For example, you would need conversational memory to write a chatbot, since it takes the previous interactions in a conversation as part of the context for generating a future response. At a high level, this mostly resembles a generic read/write store, and we will use an OpenSearch index for it. However, the interesting nuance is in the data itself, which we will describe next. A conversation is represented as a list of interactions, ordered chronologically. Each conversation will also include some metadata, like the start time and the number of interactions. The basic elements of an interaction are an input and a response, representing the human input to an AI agent and that agent’s response. We’ll also include any additional prompting that was used in the interaction, the agent that was used in this interaction, and possible arbitrary metadata that the agent may want to include. For example, a conversational search agent may include the actual search results as metadata for a user search query (which is an interaction). Each `ConversationMetadata` and `Interaction` will have access controls linked to the specific user that creates them. Only Alice can add to and read from conversations that Alice owns. The main rationale for this is that Alice’s conversation will potentially include information from all documents Alice has access to, so her conversations’ access controls are maximally the intersection of Alice’s access rights. We plan to leverage OpenSearch’s existing access control mechanisms for this. The plan is to maintain 2 indices - 1 for `ConversationMetadata` and 1 for `Interaction`. ``` structure ConversationMetadata { conversationId: ConversationId numInteractions: Integer createTime: Timestamp lastInteractionTime: Timestamp name: String } structure Interaction { conversationId: ConversationId interactionId: InteractionId input: String prompt: String response: String agent: String time: Timestamp attributes: InteractionAttributes } ``` **API** The operations for conversational memory are similar to the usual CRUD operations for a datastore. `CreateInteraction` will update the appropriate `ConversationMetadata` to have a correct `lastInteractionTime` and `numInteractions` ``` /// Creates a new conversation and returns its id operation CreateConversation { input: CreateConversationInput output: CreateConversationOutput } @input structure CreateConversationInput { name: String } @output structure CreateConversationOutput { conversationId: ConversationId } /// Returns the list of all conversations operation GetConversations { input: GetConversationsInput output: GetConversationsOutput } @input structure GetConversationsInput { nextToken: String maxResults: Integer } @output structure GetConversationsOutput { conversations: List[ConversationMetadata] nextToken: String } /// Adds an interaction to a conversation and returns its id operation CreateInteraction { input: CreateInteractionInput output: CreateInteractionOutput } @input structure CreateInteractionInput { @required @httpLabel conversationId: ConversationId input: String prompt: String response: String agent: String attributes: InteractionAttributes } @output structure CreateInteractionOutput { interactionId: InteractionId } /// Returns the list of interactions associated with a conversation operation GetInteractions { input: GetInteractionsInput output: GetInteractionsOutput } @input structure GetInteractionsInput { @required @httpLabel conversationId: ConversationId nextToken: String maxResults: Integer } @output structure GetInteractionsOutput { metadata: ConversationMetadata interactions: List[Interaction] nextToken: String } operation DeleteConversation { input: DeleteConversationInput output: DeleteConversationOutput } @input structure DeleteConversationInput { @required @httpLabel conversationId: ConversationId } @output structure DeleteConversationOutput { success: Boolean } ``` We do not propose having an update API for conversation metadata, and we treat this as immutable. We believe that users would prefer to just create a new conversation than update parameters on an existing one. ### Search Pipeline extension The conversational search path essentially consists of an OpenSearch query, with some pre- and post-processing. Search Pipelines, introduced in 2.8, are a tool for pre- and post-processing in the query path, so we have chosen to use that mechanism to implement conversational search. We have chosen to implement the question answering component of RAG in the form of query result rewrites. We are introducing a new response processor that sends the top search results, and optionally some previous conversation history to the LLM to generate a response in the conversation. We are also introducing a new response processor that iterates over search hits and interacts with an LLM to produce an answer for each result with a score. Finally, we are introducing a request processor to rephrase the user’s query, taking into account the conversation history. We will rely on the remote inference feature proposed in https://github.com/opensearch-project/ml-commons/issues/882 for answer generation. Based on different patterns we have seen with applications, we designed this API to support “one-off” and “multi-shot” conversations. Users can have “one-off” question answering interactions, where the prior context is not included, via a search pipeline that uses this new question answering processor. Users can also have “multi-shot” conversations where interactions are stored in conversational memory and are used as additional context that is sent to the model along with each search query. Users will need to use the Conversational Search plugin to create a conversation and pass the conversationId to the search pipeline in order to retain all the interactions associated with it. In addition to the conversation ID, users can also pass a “prompt” parameter for any prompt engineering alongside their search query. ``` GET wiki-simple-paras/_search?search_pipeline=convo_qa_pipeline { "_source": ["title", "text"], "query" : { "neural": { "text_vector": { "query_text": "When was Abraham Lincoln born?", "k": 10, "model_id": "" } } }, "ext": { "question_answering_parameters": { "question": "When was Abraham Lincoln born?" }, "conversation" : { "id": "...", "prompt": "..." } } } ``` The search pipeline includes pre and post processing steps. The pre-processing step uses generative AI to rewrite the search query submitted by the user, taking into account the conversation history if a conversation was specified. This allows things like antecedent replacement (”When was he born?” → “When was Abraham Lincoln born?”, if the prior question was “Who was Abraham Lincoln?”). The post-processing step is a processor that takes the search results, optionally performs a lookup against the conversational memory, and then sends this data to the LLM configured by the user. We believe different users will want to use different LLMs, so this will be pluggable. ### Conversation API The point of this API is to provide conversational search as a relatively simple endpoint, hooking pieces together such that the user can easily build an application with it. It takes a search query (or some other kind of human input), performs a search against OpenSearch, and then feeds those search results into an LLM and returns the answer. All of this work is done in the search pipeline underneath - so the API is just a wrapper - but we feel this kind of an API would be helpful to developers who just want an easy REST API. We would like to return search results as well as the LLM response. This differs from most existing systems that return only answers, and it allows clients to perform validations or additional downstream processing. ``` /// Ask a question and get a GenAI response grounded in search results operation Query { input: QueryInput output: QueryOutput } structure QueryInput { index: String conversationId: ConversationId query: String prompt: String filter: String numResults: Integer } structure QueryOutput { response: String rewrittenQuery: String searchResults: DocList interactionId: InteractionId } /// List of docs used to answer the question list DocList { member: Document } ``` # Discussion 1. **Performance:** LLM inference takes on the order of seconds; if you have sufficiently high traffic, that can increase to minutes or more as an LLM hosting service rate-limits or a hosted model becomes resource constrained. Five people using this at the same time could have the potential to completely stall each other out. We’ll try to be fault-tolerant as regards this, but a lot of the onus may fall on the users and the LLM hosters to work out how to get higher LLM throughput. 2. **Ordering:** Since LLM inference can take a while, a user might get impatient and ask a bunch of search queries before the first search query has returned an answer; and the answers might come back from the LLM out of order. We will write only complete interactions, meaning the order that messages come back from the LLM. The client should disallow multiple queries at once (in a conversation) to prevent this. 3. **Dependencies:** This relies on the relatively new search pipeline and remote inference features. Accordingly, this probably only works for OpenSearch ≥ 2.9, with the appropriate ML-Commons installation. We’re also hoping to get the pipelines themselves into Search-Processors; in which case that plugin also becomes a dependency. Lastly, the high-level Conversational API depends on the Conversational Pipeline, and they both depend on the Conversational Memory plugin, which we think should be its own plugin. We’ll put out some resources on building once we figure it out. # Summary In this RFC we gave a proposal for bringing conversational search into OpenSearch. Our proposal consists of three components: 1/ an API for conversational memory stored in OpenSearch, 2/ an OpenSearch search pipeline for Retrieval-Augmented Generation (RAG), and 3/ an API that provides a simple one-shot API for conversational search applications. We would appreciate any feedback, suggestions, and comments towards integrating this cleanly with the rest of the OpenSearch ecosystem and making it the best it can be. Thanks! # Requested Feedback - Does this feature set cover the set of use cases for generative AI applications that you want to build? We have been focused on search applications and we’re interested in how much the community wants to go beyond exposing conversational search and conversational memory building blocks at this time. - We believe the search pipeline is great mechanism to define a RAG pipelines, but we also felt that a conversational API that invokes this pipeline would be helpful for developers to more easily build conversational search applications. We’d love feedback on if we should add more to this API, or conversely if it’s even needed in providing an easy developer experience. - This approach for RAG introduces a several cross plugin dependencies. There has been talk in the community about moving away from the plugin architecture for OpenSearch, and we want to make sure this approach is aligned with the higher-level architectural goals of the project. We’d appreciate feedback on this topic.

dhrubo-os commented 11 months ago

I mean not just ml-commons, other components also needs this conversation/memory layer. ml-commons can use this, but not necessary to build the whole layer to ml-commons. I think it can be reused by other components/plugins too. So keep a separate plugin can make the architecture clear. For example, some plugin like Alerting may need a memory layer too but they don't need ML. Why they have to add ml-commons as dependency? They can just depend on the dedicated conversation/memory thing.

I'm curious to know more about why other plugins might need this memory layer. Is it to communicate with LLM? If that's the case, for other plugins to talk to LLM (either remote or local), they would need to have ml-commons as a dependency anyway."

mashah commented 11 months ago

@sean-zheng-amazon @ylwu-amzn

I appreciate your point of view that this might be more general. Conversational memory arose from the LLM use cases, and I have not yet seen customers ask for that separately.

Over time, there may be many uses where people don't want it in ML. Using the customer obsession and working backwards principle, let's wait for the users to tell us that.

Certainly, conversational memory is needed for ML. Can we at least first agree that it should be part of the default bundle?

@austintlee @jonfritz

jonfritz commented 11 months ago

I think @dhrubo-os is making a good point above. Given that the use cases for conversational memory are tied to conversations, which are then tied to interacting with an LLM, it seems like ml-commons would be a dependency for all (or at least the vast majority) of applications that would need conversational memory anyway.

I'm good with ml-commons, seems like the logical place to start. Curious is anyone is still opposed to this argument? If so, it would be helpful to have a crisp example of when conversational memory would be used outside of LLMs/ML/AI.

If not, can we close on this decision and update the RFC?

sean-zheng-amazon commented 11 months ago

@mashah @jonfritz just to clarify, my point is NOT that customers might ask for conversation memory separately, my concern is that there are a certain number of customers who might not need conversation memory at all. e.g. most of the neural search/ knn search users don't need conversational search support. Bundling everything in a centralized ml-commons plugins will eventually make it cumbersome and waste customers precious resource. Splitting can enable customers deploy necessary plugins flexibly based on their needs. Thoughts?

ylwu-amzn commented 11 months ago

Here we have two concepts: a general memory layer/framework, and conversational history/memory. I mean for ml-commons we need a general memory layer/framework. We could use conversational history as memory, but not have to. I suggest keep the current conversational search scope as is. When agent framework is ready, it can depend on conversation history as one memory type, but not have to. We could have other memory implementation. I think we don't need to couple things together.

So conversation plugin will support conversation history CRUD, include search pipeline processor. ml-commons Agent framework could use conversation history CRUD as one type of memory implementation.

ylwu-amzn commented 11 months ago

@jonfritz can you summarize what's the most important thing you want to build for now? Let's keep the scope clear and prioritize the most important things. I feel that could make the discussion easier and we (here means our team, your team and any community developer) can build the most important things for your team first. Maybe a crisp list of function/feature items and explanation?

jonfritz commented 11 months ago

@ylwu-amzn what we want to do is 1/Add conversational memory to ml-commons, so we can use this building block for conversational search and other LLM-based applications that use conversations. This is critical for any conversational application, and the community on Slack and on this thread has defined ml-commons' scope to include this. 2/We don't see a crisp use case for conversational memory/context in this way outside of LLM-based applications. 3/ @ylwu-amzn the RFC explains what we want to build - do you have a specific question?

From the conversation above, the stated goals for ml-commons, and the conversation on Slack, I'm still having trouble understanding why this code wouldn't go into ml-commons. It's a core building block for LLM-augmented apps for search and conversation - it's required to store context. Let's make it easy for customers to build applications with OpenSearch by leveraging this in one place, and it seems like the community has stated goals for ml-commons to include these aspects of building AI/ML apps.

If there are use cases in the future outside of ML/AI, let's create a new framework then - but lets bias for action now, and get this into the hands of customers in the right way. Seems like @ylwu-amzn is suggesting that this functionality would end up in some way in ml-commons in the future, and @elfisher suggests we don't implement this twice. All of this points to ml-commons for this RFC. Would love to understand the new scope of ml-commons if we do not want to put conversational memory there, given the prior conversation describing ml-commons as the place for components supporting AI/ML.

mashah commented 11 months ago

@sean-zheng-amazon

Thanks for your clarification. There are lots of features in ML-commons that people may not need, but comes with ML-commons. It seems to me that separating everything would be messy.

Based on LLMs today, it seems to me that without conversational history, it's hard to build a chat application. So, it belongs in ML-commons if the set of LLM functionality goes in ML-commons.

@ylwu-amzn

I'm unsure about what you're asking. The RFC clearly describes what we are building now and there are links to code. Please take a look.

Together, we can decide how to change it and extend it. We are happy to work with you to figure that out.

Finally, I kindly ask for a clear answer: Will conversational history be a part of the default bundle?

ylwu-amzn commented 11 months ago

Thanks @jonfritz ,

RFC explains what we want to build - do you have a specific question?

So I see this RFC propose to build

A new plugin that provides a CRUD API to store and access conversation history (”memory”). A new Search Pipeline implementation that uses conversational memory and large language models for question answering. A new plugin that enables users to have conversations through a new Conversation API.

For

A new plugin that provides a CRUD API to store and access conversation history (”memory”).

I have the same proposal here. But seems you changed the mind to not build A new plugin, now you prefer to put the CRUD APIs in ml-commons, right?

For

A new Search Pipeline implementation that uses conversational memory and large language models for question answering.

Per my understanding, it's not reasonable to put search related things to ml-commons. We have neural-search plugin, that could be a good place for search related things. Or create a new plugin for RAG search.

For

A new plugin that enables users to have conversations through a new Conversation API.

Agree, the conversation is a special use case like semantic search which implemented in a separate plugin: neural-search.

Am I understanding your proposal correctly?

jonfritz commented 11 months ago

@ylwu-amzn from feedback on the scope/goals of ml-commons and comments from the community, we have moved from the original idea of "a new plugin" to adding this to ml-commons. The functionality of what the component does hasn't changed, just where it lives. Given there is talk of adding agent related frameworks to ml-commons as well (from the chat in Slack and on #1161 ), it seems clear that conversational memory would also be needed alongside it in ml-commons. In #1161, you (@ylwu-amzn) wrote "And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer." If the goal is this, FWIW, it seems like ml-commons would need conversational memory because any conversational app would need it.

Given your note here, are you now proposing that the agent-related work also go in a separate plugin now as well and not ml-commons?

ylwu-amzn commented 11 months ago

@jonfritz , I think here in this RFC , we see multiple items, not just conversation history, right? For example, you also plan to build search pipeline processor, a new conversation API etc. I don't think we should put all of these into ml-commons. See my suggestion https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1665979389 Can you give a crisp summarization of your current tech solution proposal?

austintlee commented 11 months ago

Can you give a crisp summarization of your current tech solution proposal?

That is already provided in the above RFC write-up. As you said, we are proposing multiple components that go together to provide conversational search which entails RAG. Conversation history is an important part of this solution. There appears to be a clear indication both here and on Slack that the community would like these components in ml-commons.

sean-zheng-amazon commented 11 months ago

Together, we can decide how to change it and extend it. We are happy to work with you to figure that out.

Thanks, please do update the RFC according to the community discussion as @ylwu-amzn asked, e.g. instead of creating a new plugin, you now propose to include this function within ml-commons.

Finally, I kindly ask for a clear answer: Will conversational history be a part of the default bundle?

Personally I'm still not fully convinced, but don't want to block the development either. We'll create a feature branch so everyone can add their codes there. We can always refactoring/splitting if we see the needs in future.

austintlee commented 11 months ago

@sean-zheng-amazon Yes, we will update the RFC to indicate that we want these capabilities to live in ml-commons. And thank you so much for your offer to unblock the development of the work by creating a feature branch in ml-commons. I think that is the best path forward in terms of getting community visibility on the RFC. We are really looking forward to having the whole OpenSearch community collaborate with us on this feature and earning everyone's trust.

ylwu-amzn commented 11 months ago

hi, guys, thanks a lot for the good discussion and we are going to create a new feature branch.

One update: We have some breaking changes from core and 2.x branch is breaking now, with #1187 merged, we will cut a new feature branch soon.

ylwu-amzn commented 11 months ago

New feature branch created https://github.com/opensearch-project/ml-commons/tree/feature/conversation

austintlee commented 11 months ago

@ylwu-amzn Thank you!

So, what's the protocol for working with feature branches? Since it is created to host work being proposed in this RFC, it may make sense for me to have permission to push to this branch. What do you think? Do we have a protocol for working with feature branches documented somewhere?

ylwu-amzn commented 11 months ago

You can follow this doc https://github.com/opensearch-project/ml-commons/blob/2.x/DEVELOPER_GUIDE.md#fork-and-clone-ml-commons

For feature branch, it should be the same. You can fork the repo and develop on your own fork repo first. Then publish PR to the feature branch. PR will be reviewed first, and two approvals needed before merging.

navneet1v commented 11 months ago

I see already there is a huge conversation around why this feature is not present in a different plugin and is part of ML Commons. I kind of echo on that fact this should be outside of ML Commons and be part of separate plugin. I always see ML commons Plugin as a gateway to do ML related core capabilities which can be then extended to do specific work like Semantic Search, Q&A, Summarization etc.

If we start putting features like Conversation API, Chat APIs, and any new APIs that come in future in ML commons, ML commons will become fat and unusable for other plugins who just want to do basic features like Semantic Search, Q&A etc.

Adding APIs also come with other baggage, like we are adding new processors Q&A and Summarization Processors which are very generic and not only to be used in the conversation context. These will add to the fattiness of the ML plugin and will make the other plugins using ML Commons as dependencies unusable.

From the proposal I can see we can break it down to what functionalities come in ML commons and what can go in new plugin or any other plugin where that feature can be used in generic fashion.

I am aligned with what @sean-zheng-amazon was mentioning. Not every use case require all the features of ML commons plugin and not even all the dependencies. Semantic Search, hybrid Search as the top use-cases in that area.

cc: @austintlee , @ylwu-amzn , @sean-zheng-amazon

austintlee commented 11 months ago

As Sean stated above, and I re-iterated it in #1195, we decided to put the current work in a feature branch in ml-commons to let the development move forward. This is why our PRs are being raised there. We should not block review of the work to revisit this topic. I am not saying this discussion is closed, but just clarifying how we got here.

navneet1v commented 11 months ago

Adding some more clarity

@austintlee I think from development side its fine and I am not blocking the development.

I was adding my thoughts to the decision of putting as a separate plugin or in ml plugin itself.

saratvemulapalli commented 11 months ago

Great discussion here and Im probably late. Couple of questions:

Data Store

we will use an OpenSearch index for it

For conversation history, is there time period for which you'd want to keep these conversations? Feels like its a lot of data and might not be needed after a point in time. I would love to see it configurable by the developer.

structure ConversationMetadata {
    conversationId: ConversationId
    numInteractions: Integer
    createTime: Timestamp
    lastInteractionTime: Timestamp
    expiresAfter: Timestamp
    name: String
}

Security

Each ConversationMetadata and Interaction will have access controls linked to the specific user that creates them. Only Alice can add to and read from conversations that Alice owns. The main rationale for this is that Alice’s conversation will potentially include information from all documents Alice has access to, so her conversations’ access controls are maximally the intersection of Alice’s access rights. We plan to leverage OpenSearch’s existing access control mechanisms for this.

Feels like we plan to do authorization but not authentication. i.e anybody who sends us the conversationId we automatically authenticate them. Do I understand this correctly?

HenryL27 commented 11 months ago

@saratvemulapalli Unless I'm misunderstanding how the security plugin works, anyone who can hit the cluster must be authenticated with it enabled. If security is enabled, then authentication is a precondition to all of this; it's also a precondition for basic operations like search.

austintlee commented 10 months ago

@hijakk I don't think your use case (vectorizing base64 encoded images) requires a (remote) inference model. You just need a way to make HTTP calls to an external service, right?

hijakk commented 10 months ago

Correct, calls out to a remote http service would be sufficient

On Fri, Aug 25, 2023, 00:56 Austin Lee @.***> wrote:

@hijakk https://github.com/hijakk I don't think your use case (vectorizing base64 encoded images) requires a (remote) inference model. You just need a way to make HTTP calls to an external service, right?

— Reply to this email directly, view it on GitHub https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1692932065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBWDJRCUFPALZDXUY7AULXXBLBJANCNFSM6AAAAAA2QVODOI . You are receiving this because you were mentioned.Message ID: @.***>

saratvemulapalli commented 10 months ago

@saratvemulapalli Unless I'm misunderstanding how the security plugin works, anyone who can hit the cluster must be authenticated with it enabled. If security is enabled, then authentication is a precondition to all of this; it's also a precondition for basic operations like search.

I should've explained it better. @HenryL27 you are right, by the time a query reaches the search phases user authentication/authorization is done. My question was if Alice and Bob are valid users, when Alice makes a conversation with QA, they get a conversationId abc. What if Bob uses abc as conversationId to continue to the conversation? How do we authenticate Alice has access to conversation abc and not anyone else?

HenryL27 commented 10 months ago

ah. behind the scenes we're also writing down the user that owns each conversation in the high-level conversation object. When Bob tries to mess with conversation abc we first check to see if he's the owner, and if not, we give him an access denied

DarshitChanpura commented 10 months ago

@austintlee Should this issue be moved to 2.11?

austintlee commented 10 months ago

Let me just quickly highlight what is being released in 2.10.

A new CRUD API for conversational memory. You can create and store "conversations" and "interactions" in an OpenSearch cluster.
- Access control on conversations and interactions is at the conversation level.
- Currently, we only support "private" mode meaning access to a conversation is tied to the owner/creator of the conversation.
A new search processor that performs Retrieval Augmented Generation using search query results and a remote inference service (e.g. OpenAI) and conversational memory.

So, most of what we mentioned above in the RFC should be coming out in 2.10 as an experimental feature. It is being made available via the ml-commons plugin so it should be fairly easy for people to try out. We will have a tutorial to go with this release on how to use this feature.

Our work is not done. We want to make sure this feature goes GA by 2.11. We have some improvements we have in mind. We are excited to make this available in 2.10 and are looking forward to feedback and suggestions. There are a lot of interesting things people are doing in the RAG space and we would love to work with the community to bring these ideas to OpenSearch!

dylan-tong-aws commented 10 months ago

@austintlee, can you clarify the purpose of this query block in the example that you provided in the RFC?

"ext": { "question_answering_parameters": { "question": "When was Abraham Lincoln born?" },

It's not clear why it repeats the query context: "query_text": "When was Abraham Lincoln born?",

Reference:

GET wiki-simple-paras/_search?search_pipeline=convo_qa_pipeline { "_source": ["title", "text"], "query" : { "neural": { "text_vector": { "query_text": "When was Abraham Lincoln born?", "k": 10, "model_id": "" } } }, "ext": { "question_answering_parameters": { "question": "When was Abraham Lincoln born?" }, "conversation" : { "id": "...", "prompt": "..." } } }

austintlee commented 10 months ago

@dylan-tong-aws

Oftentimes, you may want to customize your query to OpenSearch (hybrid search, e.g.) and feed the result as additional context to an LLM so the current interface allows applications to construct the OS query and the LLM question as two inputs.

In trying to keep the example simple, I may have made it a bit confusing since it repeats the same question twice. But let's say you want to ask a follow-up question - "when did he die?" In this case, you won't want to pass that question as-is to OpenSearch as it won't know what you mean by "he". But the LLM will figure it out based on the chat history.

Using the 2.10 Release Candidate, I made some sample queries to demonstrate the point:

Query 1 (BM25 + KNN)

POST demo/_search?size=5&search_pipeline=demo_pipeline
{
  "_source": ["title", "text"], 
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text":  "american presidents"
          }
        },
        {
          "neural": {
            "text_vector": {
              "query_text": "Was Abraham Lincoln a good politician?",
              "k": 10,
              "model_id": "<...>"
            }
          }
        }
      ],
      "boost": 1
    }
  },
  "ext": {
      "generative_qa_parameters": {
        "llm_model": "gpt-3.5-turbo",
        "llm_question": "Was Abraham Lincoln a good politician"
      }
  }
}

Query 2 (Term only)

POST demo/_search?size=5&search_pipeline=demo_pipeline
{
  "_source": ["title", "text"], 
  "query": {
    "hybrid": {
      "queries": [
        {
          "term": {
            "text": {
              "value": "president",
              "boost": 1
            }
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "text": {
                    "value": "character",
                    "boost": 1
                  }
                }
              },
              {
                "term": {
                  "text": {
                    "value": "politician",
                    "boost": 1
                  }
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "boost": 1
    }
  },
  "ext": {
      "generative_qa_parameters": {
        "llm_model": "gpt-3.5-turbo",
        "llm_question": "Was Abraham Lincoln a good politician"
      }
  }
}

We can introduce question rewriting (when did he die -> when did Abraham Lincoln die), but this may require some new work in SearchQueryBuilder, maybe an extension similar to what neural search and hybrid search did (e.g. ConversationalSearchQueryBuilder).

dylan-tong-aws commented 10 months ago

@austintlee, so the query clause is the retriever part of the RAG workflow, correct? So, when a neural search query is being used with this pipeline, the initial query will probably be redundant. Is the idea that the subsequent queries like "when did he die" will be passed via "llm_question" and the neural search query will keep the original query context like "what was Abraham Lincoln's life like?"

What controls do I have around history context. I see examples where you can provide a conversation (session) id. Can I dynamically specify the context history like "last=N" exchanges?

Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?

austintlee commented 10 months ago

Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?

Yes, we want to tackle this in the next iteration. This will simplify the experience. I think confusion here is coming from the fact that you have to enter each question twice when it doesn't have to be that way. As I stated above, I am considering a new search query type that gives the user the flexibility to ask one question or one question + an OpenSearch query (I gave two examples of this above).

khoaisohd commented 7 months ago

Hi @austintlee, since the conversational memory will grow over time according to the number of conversational searches customers made. Do we have any idea about conversation memory scalability?

HenryL27 commented 6 months ago

@ylwu-amzn how's the appsec review going? Are we gonna hit GA for 2.12? thx

mashah commented 6 months ago

Folks,

We know that the conversational memory feature is experimental because of internal AWS processes needed to test the integrity of the feature.

Can we get a status on how your review and testing process is going? I believe these features were scheduled to be GA in 2.12. Since 2.12 is not delayed until 20 Feb, I am assuming that we have almost cleared the hurdle.

@sean-zheng-amazon @ylwu-amzn

sean-zheng-amazon commented 6 months ago

@mashah yes we are on track to GA the feature in 2.12. The pentest is scheduled to start 23 Jan, and finish by 31 Jan. we still have a couple of week's time to fix if any security issues caught in the test.

ylwu-amzn commented 6 months ago

Yes, hope we don't have much issues for PenTest. If they find any issue, we will share to your team .

mashah commented 6 months ago

Is the pentest on track for next week on 23 Jan for the conversational memory features?

sean-zheng-amazon commented 6 months ago

yes we are on track

dblock commented 4 months ago

Can this be closed?

austintlee commented 4 months ago

It's GA in 2.12. Closing.

opensearch-project / ml-commons

[RFC] Conversations and Generative AI in OpenSearch #1150

Introduction

Goals

Non-Goals

Proposed Architecture

Data Store

Security