Closed austintlee closed 4 months ago
I mean not just ml-commons, other components also needs this conversation/memory layer. ml-commons can use this, but not necessary to build the whole layer to ml-commons. I think it can be reused by other components/plugins too. So keep a separate plugin can make the architecture clear. For example, some plugin like Alerting may need a memory layer too but they don't need ML. Why they have to add ml-commons as dependency? They can just depend on the dedicated conversation/memory thing.
I'm curious to know more about why other plugins might need this memory layer. Is it to communicate with LLM? If that's the case, for other plugins to talk to LLM (either remote or local), they would need to have ml-commons as a dependency anyway."
@sean-zheng-amazon @ylwu-amzn
I appreciate your point of view that this might be more general. Conversational memory arose from the LLM use cases, and I have not yet seen customers ask for that separately.
Over time, there may be many uses where people don't want it in ML. Using the customer obsession and working backwards principle, let's wait for the users to tell us that.
Certainly, conversational memory is needed for ML. Can we at least first agree that it should be part of the default bundle?
@austintlee @jonfritz
I think @dhrubo-os is making a good point above. Given that the use cases for conversational memory are tied to conversations, which are then tied to interacting with an LLM, it seems like ml-commons would be a dependency for all (or at least the vast majority) of applications that would need conversational memory anyway.
I'm good with ml-commons, seems like the logical place to start. Curious is anyone is still opposed to this argument? If so, it would be helpful to have a crisp example of when conversational memory would be used outside of LLMs/ML/AI.
If not, can we close on this decision and update the RFC?
@mashah @jonfritz just to clarify, my point is NOT that customers might ask for conversation memory separately, my concern is that there are a certain number of customers who might not need conversation memory at all. e.g. most of the neural search/ knn search users don't need conversational search support. Bundling everything in a centralized ml-commons plugins will eventually make it cumbersome and waste customers precious resource. Splitting can enable customers deploy necessary plugins flexibly based on their needs. Thoughts?
Here we have two concepts: a general memory layer/framework, and conversational history/memory. I mean for ml-commons we need a general memory layer/framework. We could use conversational history as memory, but not have to. I suggest keep the current conversational search scope as is. When agent framework is ready, it can depend on conversation history as one memory type, but not have to. We could have other memory implementation. I think we don't need to couple things together.
So conversation plugin will support conversation history CRUD, include search pipeline processor. ml-commons Agent framework could use conversation history CRUD as one type of memory implementation.
@jonfritz can you summarize what's the most important thing you want to build for now? Let's keep the scope clear and prioritize the most important things. I feel that could make the discussion easier and we (here means our team, your team and any community developer) can build the most important things for your team first. Maybe a crisp list of function/feature items and explanation?
@ylwu-amzn what we want to do is 1/Add conversational memory to ml-commons, so we can use this building block for conversational search and other LLM-based applications that use conversations. This is critical for any conversational application, and the community on Slack and on this thread has defined ml-commons' scope to include this. 2/We don't see a crisp use case for conversational memory/context in this way outside of LLM-based applications. 3/ @ylwu-amzn the RFC explains what we want to build - do you have a specific question?
From the conversation above, the stated goals for ml-commons, and the conversation on Slack, I'm still having trouble understanding why this code wouldn't go into ml-commons. It's a core building block for LLM-augmented apps for search and conversation - it's required to store context. Let's make it easy for customers to build applications with OpenSearch by leveraging this in one place, and it seems like the community has stated goals for ml-commons to include these aspects of building AI/ML apps.
If there are use cases in the future outside of ML/AI, let's create a new framework then - but lets bias for action now, and get this into the hands of customers in the right way. Seems like @ylwu-amzn is suggesting that this functionality would end up in some way in ml-commons in the future, and @elfisher suggests we don't implement this twice. All of this points to ml-commons for this RFC. Would love to understand the new scope of ml-commons if we do not want to put conversational memory there, given the prior conversation describing ml-commons as the place for components supporting AI/ML.
@sean-zheng-amazon
Thanks for your clarification. There are lots of features in ML-commons that people may not need, but comes with ML-commons. It seems to me that separating everything would be messy.
Based on LLMs today, it seems to me that without conversational history, it's hard to build a chat application. So, it belongs in ML-commons if the set of LLM functionality goes in ML-commons.
@ylwu-amzn
I'm unsure about what you're asking. The RFC clearly describes what we are building now and there are links to code. Please take a look.
Together, we can decide how to change it and extend it. We are happy to work with you to figure that out.
Finally, I kindly ask for a clear answer: Will conversational history be a part of the default bundle?
Thanks @jonfritz ,
RFC explains what we want to build - do you have a specific question?
So I see this RFC propose to build
A new plugin that provides a CRUD API to store and access conversation history (”memory”). A new Search Pipeline implementation that uses conversational memory and large language models for question answering. A new plugin that enables users to have conversations through a new Conversation API.
For
A new plugin that provides a CRUD API to store and access conversation history (”memory”).
I have the same proposal here. But seems you changed the mind to not build A new plugin
, now you prefer to put the CRUD APIs in ml-commons, right?
For
A new Search Pipeline implementation that uses conversational memory and large language models for question answering.
Per my understanding, it's not reasonable to put search related things to ml-commons. We have neural-search plugin, that could be a good place for search related things. Or create a new plugin for RAG search.
For
A new plugin that enables users to have conversations through a new Conversation API.
Agree, the conversation is a special use case like semantic search which implemented in a separate plugin: neural-search.
Am I understanding your proposal correctly?
@ylwu-amzn from feedback on the scope/goals of ml-commons and comments from the community, we have moved from the original idea of "a new plugin" to adding this to ml-commons. The functionality of what the component does hasn't changed, just where it lives. Given there is talk of adding agent related frameworks to ml-commons as well (from the chat in Slack and on #1161 ), it seems clear that conversational memory would also be needed alongside it in ml-commons. In #1161, you (@ylwu-amzn) wrote "And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer." If the goal is this, FWIW, it seems like ml-commons would need conversational memory because any conversational app would need it.
Given your note here, are you now proposing that the agent-related work also go in a separate plugin now as well and not ml-commons?
@jonfritz , I think here in this RFC , we see multiple items, not just conversation history, right? For example, you also plan to build search pipeline processor, a new conversation API etc. I don't think we should put all of these into ml-commons. See my suggestion https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1665979389 Can you give a crisp summarization of your current tech solution proposal?
Can you give a crisp summarization of your current tech solution proposal?
That is already provided in the above RFC write-up. As you said, we are proposing multiple components that go together to provide conversational search which entails RAG. Conversation history is an important part of this solution. There appears to be a clear indication both here and on Slack that the community would like these components in ml-commons.
Together, we can decide how to change it and extend it. We are happy to work with you to figure that out.
Thanks, please do update the RFC according to the community discussion as @ylwu-amzn asked, e.g. instead of creating a new plugin, you now propose to include this function within ml-commons.
Finally, I kindly ask for a clear answer: Will conversational history be a part of the default bundle?
Personally I'm still not fully convinced, but don't want to block the development either. We'll create a feature branch so everyone can add their codes there. We can always refactoring/splitting if we see the needs in future.
@sean-zheng-amazon Yes, we will update the RFC to indicate that we want these capabilities to live in ml-commons. And thank you so much for your offer to unblock the development of the work by creating a feature branch in ml-commons. I think that is the best path forward in terms of getting community visibility on the RFC. We are really looking forward to having the whole OpenSearch community collaborate with us on this feature and earning everyone's trust.
hi, guys, thanks a lot for the good discussion and we are going to create a new feature branch.
One update: We have some breaking changes from core and 2.x branch is breaking now, with #1187 merged, we will cut a new feature branch soon.
New feature branch created https://github.com/opensearch-project/ml-commons/tree/feature/conversation
@ylwu-amzn Thank you!
So, what's the protocol for working with feature branches? Since it is created to host work being proposed in this RFC, it may make sense for me to have permission to push to this branch. What do you think? Do we have a protocol for working with feature branches documented somewhere?
You can follow this doc https://github.com/opensearch-project/ml-commons/blob/2.x/DEVELOPER_GUIDE.md#fork-and-clone-ml-commons
For feature branch, it should be the same. You can fork the repo and develop on your own fork repo first. Then publish PR to the feature branch. PR will be reviewed first, and two approvals needed before merging.
I see already there is a huge conversation around why this feature is not present in a different plugin and is part of ML Commons. I kind of echo on that fact this should be outside of ML Commons and be part of separate plugin. I always see ML commons Plugin as a gateway to do ML related core capabilities which can be then extended to do specific work like Semantic Search, Q&A, Summarization etc.
If we start putting features like Conversation API, Chat APIs, and any new APIs that come in future in ML commons, ML commons will become fat and unusable for other plugins who just want to do basic features like Semantic Search, Q&A etc.
Adding APIs also come with other baggage, like we are adding new processors Q&A and Summarization Processors which are very generic and not only to be used in the conversation context. These will add to the fattiness of the ML plugin and will make the other plugins using ML Commons as dependencies unusable.
From the proposal I can see we can break it down to what functionalities come in ML commons and what can go in new plugin or any other plugin where that feature can be used in generic fashion.
I am aligned with what @sean-zheng-amazon was mentioning. Not every use case require all the features of ML commons plugin and not even all the dependencies. Semantic Search, hybrid Search as the top use-cases in that area.
cc: @austintlee , @ylwu-amzn , @sean-zheng-amazon
As Sean stated above, and I re-iterated it in #1195, we decided to put the current work in a feature branch in ml-commons to let the development move forward. This is why our PRs are being raised there. We should not block review of the work to revisit this topic. I am not saying this discussion is closed, but just clarifying how we got here.
Adding some more clarity
@austintlee I think from development side its fine and I am not blocking the development.
I was adding my thoughts to the decision of putting as a separate plugin or in ml plugin itself.
Great discussion here and Im probably late. Couple of questions:
we will use an OpenSearch index for it
For conversation history, is there time period for which you'd want to keep these conversations? Feels like its a lot of data and might not be needed after a point in time. I would love to see it configurable by the developer.
structure ConversationMetadata {
conversationId: ConversationId
numInteractions: Integer
createTime: Timestamp
lastInteractionTime: Timestamp
expiresAfter: Timestamp
name: String
}
Each
ConversationMetadata
andInteraction
will have access controls linked to the specific user that creates them. Only Alice can add to and read from conversations that Alice owns. The main rationale for this is that Alice’s conversation will potentially include information from all documents Alice has access to, so her conversations’ access controls are maximally the intersection of Alice’s access rights. We plan to leverage OpenSearch’s existing access control mechanisms for this.
Feels like we plan to do authorization but not authentication. i.e anybody who sends us the conversationId
we automatically authenticate them. Do I understand this correctly?
@saratvemulapalli Unless I'm misunderstanding how the security plugin works, anyone who can hit the cluster must be authenticated with it enabled. If security is enabled, then authentication is a precondition to all of this; it's also a precondition for basic operations like search.
@hijakk I don't think your use case (vectorizing base64 encoded images) requires a (remote) inference model. You just need a way to make HTTP calls to an external service, right?
Correct, calls out to a remote http service would be sufficient
On Fri, Aug 25, 2023, 00:56 Austin Lee @.***> wrote:
@hijakk https://github.com/hijakk I don't think your use case (vectorizing base64 encoded images) requires a (remote) inference model. You just need a way to make HTTP calls to an external service, right?
— Reply to this email directly, view it on GitHub https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1692932065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBWDJRCUFPALZDXUY7AULXXBLBJANCNFSM6AAAAAA2QVODOI . You are receiving this because you were mentioned.Message ID: @.***>
@saratvemulapalli Unless I'm misunderstanding how the security plugin works, anyone who can hit the cluster must be authenticated with it enabled. If security is enabled, then authentication is a precondition to all of this; it's also a precondition for basic operations like search.
I should've explained it better. @HenryL27 you are right, by the time a query reaches the search phases user authentication/authorization is done.
My question was if Alice and Bob are valid users, when Alice makes a conversation with QA, they get a conversationId abc
. What if Bob uses abc
as conversationId to continue to the conversation? How do we authenticate Alice
has access to conversation abc
and not anyone else?
ah. behind the scenes we're also writing down the user that owns each conversation in the high-level conversation object. When Bob tries to mess with conversation abc
we first check to see if he's the owner, and if not, we give him an access denied
@austintlee Should this issue be moved to 2.11?
Let me just quickly highlight what is being released in 2.10.
So, most of what we mentioned above in the RFC should be coming out in 2.10 as an experimental feature. It is being made available via the ml-commons plugin so it should be fairly easy for people to try out. We will have a tutorial to go with this release on how to use this feature.
Our work is not done. We want to make sure this feature goes GA by 2.11. We have some improvements we have in mind. We are excited to make this available in 2.10 and are looking forward to feedback and suggestions. There are a lot of interesting things people are doing in the RAG space and we would love to work with the community to bring these ideas to OpenSearch!
@austintlee, can you clarify the purpose of this query block in the example that you provided in the RFC?
"ext": { "question_answering_parameters": { "question": "When was Abraham Lincoln born?" },
It's not clear why it repeats the query context: "query_text": "When was Abraham Lincoln born?",
Reference:
GET wiki-simple-paras/_search?search_pipeline=convo_qa_pipeline
{
"_source": ["title", "text"],
"query" : {
"neural": {
"text_vector": {
"query_text": "When was Abraham Lincoln born?",
"k": 10,
"model_id": "
@dylan-tong-aws
Oftentimes, you may want to customize your query to OpenSearch (hybrid search, e.g.) and feed the result as additional context to an LLM so the current interface allows applications to construct the OS query and the LLM question as two inputs.
In trying to keep the example simple, I may have made it a bit confusing since it repeats the same question twice. But let's say you want to ask a follow-up question - "when did he die?" In this case, you won't want to pass that question as-is to OpenSearch as it won't know what you mean by "he". But the LLM will figure it out based on the chat history.
Using the 2.10 Release Candidate, I made some sample queries to demonstrate the point:
Query 1 (BM25 + KNN)
POST demo/_search?size=5&search_pipeline=demo_pipeline
{
"_source": ["title", "text"],
"query": {
"hybrid": {
"queries": [
{
"match": {
"text": "american presidents"
}
},
{
"neural": {
"text_vector": {
"query_text": "Was Abraham Lincoln a good politician?",
"k": 10,
"model_id": "<...>"
}
}
}
],
"boost": 1
}
},
"ext": {
"generative_qa_parameters": {
"llm_model": "gpt-3.5-turbo",
"llm_question": "Was Abraham Lincoln a good politician"
}
}
}
Query 2 (Term only)
POST demo/_search?size=5&search_pipeline=demo_pipeline
{
"_source": ["title", "text"],
"query": {
"hybrid": {
"queries": [
{
"term": {
"text": {
"value": "president",
"boost": 1
}
}
},
{
"bool": {
"should": [
{
"term": {
"text": {
"value": "character",
"boost": 1
}
}
},
{
"term": {
"text": {
"value": "politician",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"boost": 1
}
},
"ext": {
"generative_qa_parameters": {
"llm_model": "gpt-3.5-turbo",
"llm_question": "Was Abraham Lincoln a good politician"
}
}
}
We can introduce question rewriting (when did he die -> when did Abraham Lincoln die), but this may require some new work in SearchQueryBuilder, maybe an extension similar to what neural search and hybrid search did (e.g. ConversationalSearchQueryBuilder).
@austintlee, so the query clause is the retriever part of the RAG workflow, correct? So, when a neural search query is being used with this pipeline, the initial query will probably be redundant. Is the idea that the subsequent queries like "when did he die" will be passed via "llm_question" and the neural search query will keep the original query context like "what was Abraham Lincoln's life like?"
What controls do I have around history context. I see examples where you can provide a conversation (session) id. Can I dynamically specify the context history like "last=N" exchanges?
Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?
Also, have you thought about extending the neural search interface so that we can avoid repeated questions in the query syntax?
Yes, we want to tackle this in the next iteration. This will simplify the experience. I think confusion here is coming from the fact that you have to enter each question twice when it doesn't have to be that way. As I stated above, I am considering a new search query type that gives the user the flexibility to ask one question or one question + an OpenSearch query (I gave two examples of this above).
Hi @austintlee, since the conversational memory will grow over time according to the number of conversational searches customers made. Do we have any idea about conversation memory scalability?
@ylwu-amzn how's the appsec review going? Are we gonna hit GA for 2.12? thx
Folks,
We know that the conversational memory feature is experimental because of internal AWS processes needed to test the integrity of the feature.
Can we get a status on how your review and testing process is going? I believe these features were scheduled to be GA in 2.12. Since 2.12 is not delayed until 20 Feb, I am assuming that we have almost cleared the hurdle.
@sean-zheng-amazon @ylwu-amzn
@mashah yes we are on track to GA the feature in 2.12. The pentest is scheduled to start 23 Jan, and finish by 31 Jan. we still have a couple of week's time to fix if any security issues caught in the test.
Yes, hope we don't have much issues for PenTest. If they find any issue, we will share to your team .
Is the pentest on track for next week on 23 Jan for the conversational memory features?
yes we are on track
Can this be closed?
It's GA in 2.12. Closing.
Introduction
The recent advances in Large Language Models (LLMs) have enabled developers to utilize natural language in their applications with better quality and ability. As ChatGPT has shown, these LLMs strongly enable use cases involving summarization and conversation. However, when prompting LLMs to answer fact-based questions (applications we call “conversational search”), we find that there are significant shortcomings for enterprise-grade applications.
First, the major LLMs are not trained on datasets that are not exposed to the internet, and therefore do not have the context to answer questions on private data. Most enterprise data falls into this category. Second, the way in which LLMs answer questions based on their training data gives rise to “hallucinations” and false answers, which are not acceptable in applications for mission critical use cases.
End-users love the ability to converse using colloquial language with an application to get answers to questions or find interesting search results, but require up-to-date information and accuracy. A solution to this problem is through Retrieval Augmented Generation (RAG), where an application sends an LLM a superset of correct information in response to a prompt, and the LLM is used to summarize and extract information from this set (instead of probabilistically determining an answer).
We believe OpenSearch could be a great platform for building conversational search applications, and aligns well with the RAG approach. It already offers semantic search capabilities using its vector database and k-NN plug-in, alongside enterprise-grade security and scalability. This is a great building block for the “source of truth” information retrieval component of RAG. However, it currently lacks the primitives and crisp APIs to easily enable the conversational element.
Although there are libraries that allow for building this functionality at the application layer (e.g. LangChain), we believe the best developer experience would be to enable this directly in OpenSearch. We consider the “G” in a RAG pipeline as LLM-based post-processing to enable direct question answering, summarization, and a conversational experience on top of OpenSearch semantic search. This enables end-users to interact with their data in OpenSearch in new ways. Furthermore, we believe developers may want to use different LLMs, and that the choice of model should be pluggable.
Through using plugins and search pipelines, we propose an architecture in this RFC to expose easily consumable APIs for conversational search, history, and storage. We segment it into a few components, including: 1/search query rewriting using generative AI and conversational context, 2/question answering and summarization of OpenSearch semantic search queries using generative AI, and 3/a concept of “conversational memory” to easily store the state of conversations and add additional interactions. Conversational Memory will also support conversational applications that have multiple agents operating together, giving a single source of truth for conversation state.
Goals
1/ Developers can easily build conversational search applications (e.g. knowledge-base search, informational chatbot, etc.) using OpenSearch and their choice of generative AI model using well-defined REST APIs. Some of these applications will be an ongoing conversation, while others will be one-shot (and the history of interactions is not important).
2/ Developers can use OpenSearch to support multi-agent conversational architectures, which require a single “source of truth” for conversational history. Multi-agent architectures will have other agents besides that for semantic search with OpenSearch (e.g. an agent that queries the public internet). These developers need an easy API to manage conversational history, both in adding interactions to conversations and exploring history of those conversations.
3/ Developers can easily obtain OpenSearch (semantic) search results alongside the generative AI question answering, so they can show the source documents and enable the end user to explore the source material.
Non-Goals
1/ Building a general LLM application toolkit in OpenSearch. Our goal is just to enable conversational search and the related dependency of conversational memory.
2/ LLM hosting. LLMs take significant resources and should be operated outside of an OpenSearch cluster. We also hope to use the ML-Commons remote inference feature rather than implement our own connectors.
3/ A conversational search application platform. Our goal is to expose crisp APIs to make building applications that use conversational search easy, but not create the end application itself.
Proposed Architecture