[Roadmap] RAG - Githubissues

thinkall commented 4 months ago

Why RAG

Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of LLMs by incorporating a retrieval mechanism into the generative process. This approach allows the model to leverage a vast amount of relevant information from a pre-existing knowledge base, which can significantly improve the quality and accuracy of its generated responses. Thus, for agents chat, incorporating a RAG agent offers several compelling advantages that can significantly enhance the performance and utility of your agent system.

RAG in AutoGen

AutoGen has provided RetrieveUserProxyAgent and RetrieveAssistantAgent for performing RetrieveChat in Aug, 2023 and announced it in blog in Oct, 2023. Given a set of documents, the Retrieval-augmented User Proxy first automatically processes documents—splits, chunks, and stores them in a vector database. Then for a given user input, it retrieves relevant chunks as context and sends it to the Retrieval-augmented Assistant, which uses LLM to generate code or text to answer questions. Agents converse until they find a satisfactory answer.

retrievechat-arch-old

As both AutoGen and RAG are evolving very fast, we find that many users are asking for supports on customized vector databases, incremental document ingesting, customized retrieve/re-ranking algorithms, customized RAG pattern/workflow, etc. We've adjusted some of the issues and feature requests, such as we've added QdrantRetrieveUserProxyAgent for using qdrant as the vector db; we've integrated UNSTRUCTURED to support many unstructed documents. However, there are many more to do.

Our Plan

In order to better support RAG in AutoGen, we plan to refactor the existing RetrieveChat agents. The goals includes:

Primary goals

[ ] Support launching RAG with one agent instead of two
[x] Support customizing vector databases with a parameter instead of extending agent class
[ ] Support RAG in AutoGen Studio
[ ] Support leveraging 3rd-party OSS tools
[ ] Make RAG a capability for any conversable agent
[ ] Support RAG as a tool like in OpenAI Assistant
[x] Make vector db dependency optional
[ ] the chat interface of the RAG agent is the same as any other conversable agent

Optional goals

[ ] Support async functions
[ ] Support benchmarks
[ ] Support evaluation

### Tasks
- [ ] https://github.com/microsoft/autogen/issues/1469
- [ ] https://github.com/microsoft/autogen/issues/1387
- [ ] https://github.com/microsoft/autogen/issues/1440
- [ ] https://github.com/microsoft/autogen/pull/1661
- [ ] https://github.com/microsoft/autogen/issues/1726
- [ ] https://github.com/microsoft/autogen/issues/1047
- [ ] https://github.com/microsoft/autogen/discussions/484
- [ ] Automatically decide whether RAG is needed
- [ ] Add a score threshold for retriever
- [ ] https://github.com/microsoft/autogen/pull/2263
- [ ] https://github.com/microsoft/autogen/pull/2271
- [ ] https://github.com/microsoft/autogen/pull/2289
- [ ] https://github.com/microsoft/autogen/issues/1844
- [ ] https://github.com/microsoft/autogen/issues/531
- [ ] https://github.com/microsoft/autogen/issues/725
- [ ] https://github.com/microsoft/autogen/issues/859
- [ ] https://github.com/microsoft/autogen/issues/1723
- [ ] https://github.com/microsoft/autogen/issues/1282
- [ ] https://github.com/microsoft/autogen/issues/1261
- [ ] https://github.com/microsoft/autogen/pull/2313
- [ ] Add documentation to user guide
- [ ] Delete files from a collection?
- [ ] Add better code parser
- [ ] https://github.com/microsoft/autogen/pull/2942
- [ ] https://github.com/microsoft/autogen/pull/2881
- [ ] https://github.com/microsoft/autogen/pull/2883
- [ ] https://github.com/microsoft/autogen/pull/2865
- [ ] https://github.com/microsoft/autogen/pull/2566
- [ ] https://github.com/microsoft/autogen/pull/2455
- [ ] https://github.com/microsoft/autogen/pull/2259
- [ ] https://github.com/microsoft/autogen/pull/2248
- [ ] https://github.com/microsoft/autogen/issues/3046

Knucklessg1 commented 4 months ago

Hello, I was wondering if there was support coming for pgvector, if not I would be happy to contribute.

julianakiseleva commented 4 months ago

@thinkall

thinkall commented 3 months ago

Hello, I was wondering if there was support coming for pgvector, if not I would be happy to contribute.

Hi @Knucklessg1 , contribution is welcome, thank you for your interest!

WaelKarkoub commented 3 months ago

Hi @thinkall, would this PR, https://github.com/microsoft/autogen/pull/2046, help out with Automatically decide whether RAG is needed?

I was thinking if the agent adds a tag like <rag context="some context"> in the message, we can intercept that by one of the hooks or even a reply, and then perform some rag operations

thinkall commented 3 months ago

Hi @thinkall, would this PR, #2046, help out with Automatically decide whether RAG is needed?

I was thinking if the agent adds a tag like <rag context="some context"> in the message, we can intercept that by one of the hooks or even a reply, and then perform some rag operations

Thank you @WaelKarkoub , interesting idea! Would adding mean RAG is already performed?

WaelKarkoub commented 3 months ago

@thinkall we could define what that tag means by adding attributes (e.g. <rag context="some context" task="search">could mean it needs to look through some databases) I'm not fully familiar with how rag works, but that tag system should be general enough for multiple use cases.

ChristianWeyer commented 3 months ago

Great initiative @thinkall. How much are you thinking of reusing what has already been built and proven in other frameworks, like LangChain?

thinkall commented 3 months ago

Great initiative @thinkall. How much are you thinking of reusing what has already been built and proven in other frameworks, like LangChain?

Thank you @ChristianWeyer , I hope to support as many established features in the OSS as possible. So we're carefully thinking of the new design of the rag feature in AutoGen. Would you like to share your thoughts?

ChristianWeyer commented 3 months ago

Great initiative @thinkall. How much are you thinking of reusing what has already been built and proven in other frameworks, like LangChain?

Thank you @ChristianWeyer , I hope to support as many established features in the OSS as possible. So we're carefully thinking of the new design of the rag feature in AutoGen. Would you like to share your thoughts?

One thing is that there are so many connector & retriever implementations in LangChain, that it would not make sense to reinvent the wheel and trying to keep up. Same goes for embedding support.

thinkall commented 3 months ago

Great initiative @thinkall. How much are you thinking of reusing what has already been built and proven in other frameworks, like LangChain?

Thank you @ChristianWeyer , I hope to support as many established features in the OSS as possible. So we're carefully thinking of the new design of the rag feature in AutoGen. Would you like to share your thoughts?

One thing is that there are so many connector & retriever implementations in LangChain, that it would not make sense to reinvent the wheel and trying to keep up. Same goes for embedding support.

Agree!

Would you like to have a quick chat on this? It would be great to hear more from you!

dsalas-crogl commented 3 months ago

@thinkall Will the upcoming RAG update still require using message_generator in groupchat scenarios? It's my understanding that currently, the RAG agent has to initiate chat and message_generator has to be used, which results in all initial prompt messages being sent through retrieve_docs in RetrieveUserProxyAgent.

ChristianWeyer commented 3 months ago

Great initiative @thinkall. How much are you thinking of reusing what has already been built and proven in other frameworks, like LangChain?

Thank you @ChristianWeyer , I hope to support as many established features in the OSS as possible. So we're carefully thinking of the new design of the rag feature in AutoGen. Would you like to share your thoughts?

One thing is that there are so many connector & retriever implementations in LangChain, that it would not make sense to reinvent the wheel and trying to keep up. Same goes for embedding support.

Agree!

Would you like to have a quick chat on this? It would be great to hear more from you!

Sure. I am cethewe in AG Discord.

thinkall commented 3 months ago

Hello, I was wondering if there was support coming for pgvector, if not I would be happy to contribute.

Hi @Knucklessg1 , contribution is welcome, thank you for your interest!

Hi @Knucklessg1 , are you in our Discord channel? Could we have a quick chat? Thanks.

thinkall commented 3 months ago

@thinkall Will the upcoming RAG update still require using message_generator in groupchat scenarios? It's my understanding that currently, the RAG agent has to initiate chat and message_generator has to be used, which results in all initial prompt messages being sent through retrieve_docs in RetrieveUserProxyAgent.

Hi @dsalas-crogl , I'd like to remove the usage of message_generator, would that benefit your use case? Thanks.

Are you in our Discord channel?

Knucklessg1 commented 3 months ago

Hello, I was wondering if there was support coming for pgvector, if not I would be happy to contribute.

Hi @Knucklessg1 , contribution is welcome, thank you for your interest!

Hi @Knucklessg1 , are you in our Discord channel? Could we have a quick chat? Thanks.

Yes absolutely. I reached out on Discord.

jamesliu commented 3 months ago

@thinkall any flow diagram regarding the rag?

thinkall commented 3 months ago

@thinkall any flow diagram regarding the rag?

Hi @jamesliu , there's one diagram here, you can find the workflow details in the Introduction section.

Josephrp commented 3 months ago

interesting roadmap , and i'm very happy with chromadb , looking forward to in memory vector store too , now. if anyone is interested it could be a good opportunity to collaborate and break down complex tasks .

i'll also consider creating + sharing an "advanced upsert" agent , which enriches the text chunks to improve retrieval performance.

raolak commented 2 months ago

Are RAG applications limited to document processing, or do they extend to code-related tasks as well? For instance:

Existing code on which we need to run fix? or
New codes (eg. new service) which goes through incremental development

Usecase:

In a system comprised of numerous microservices that are regularly updated with new features or fixes, how can an agent acquire knowledge about these changes? Is RAG a viable method for this?
For the development of new microservices within an existing system, how can knowledge be transferred to agents to enhance the design and implementation process using RAG?

thinkall commented 2 months ago

Are RAG applications limited to document processing, or do they extend to code-related tasks as well? For instance:

Existing code on which we need to run fix? or

New codes (eg. new service) which goes through incremental development

Usecase:

In a system comprised of numerous microservices that are regularly updated with new features or fixes, how can an agent acquire knowledge about these changes? Is RAG a viable method for this?

For the development of new microservices within an existing system, how can knowledge be transferred to agents to enhance the design and implementation process using RAG?

RAG can help if the documents are well organized. For pure code, currently, you can use 3rd party code chunk methods to help load the code into the vector dbs.

ekzhu commented 2 months ago

Let's also add documentation task to the roadmap? We should have a rag category under https://microsoft.github.io/autogen/docs/topics

ChristianWeyer commented 2 months ago

Do we also have a task on the roadmap for using custom embeddings? This is a very vital and important requirement for successful RAG. Another would be to use a re-ranking model optionally to improve RAG results. @thinkall

thinkall commented 2 months ago

Do we also have a task on the roadmap for using custom embeddings? This is a very vital and important requirement for successful RAG. Another would be to use a re-ranking model optionally to improve RAG results. @thinkall

Custom embeddings are already supported and will also be supported in the new version.

Re-ranking may also be supported, but we may not implement the algorithms, instead we could support plugin different re-ranking models.

maximedupre commented 2 months ago

Are RAG applications limited to document processing, or do they extend to code-related tasks as well? For instance:

Existing code on which we need to run fix? or

New codes (eg. new service) which goes through incremental development

Usecase:

In a system comprised of numerous microservices that are regularly updated with new features or fixes, how can an agent acquire knowledge about these changes? Is RAG a viable method for this?

For the development of new microservices within an existing system, how can knowledge be transferred to agents to enhance the design and implementation process using RAG?

RAG can help if the documents are well organized. For pure code, currently, you can use 3rd party code chunk methods to help load the code into the vector dbs.

@thinkall Is it possible for you to provide an example of a 3rd party code chunk method? I'm very interested in extending the knowledge of an agent to my whole codebase :)

raolak commented 2 months ago

I’ve got some thoughts on how we could use agents to automate code generation

+1 for Maxime, it would be really helpful to include a section on code chunking with examples to illustrate the working.

Now, when it comes to generating code, agents shouldn't just spit out code, they should mimic the way engineers think and work in real life. Here’s what I mean:

Start by clearly defining the requirements—think OKRs that cascade from key results down to epics and stories.
Set up milestones and break down tasks among the agents involved.
Have each agent carry out their tasks and check in regularly to ensure everything’s on track, with room for human oversight when needed.
Provide visibility of OKR, milestones and tasks visibility at one place. Make a planner central for agent and human collaboration and progress tracking.
Keep the agent's execution isolated (in a separate process). May be a distributed workflow where each node can host one or more agents. Workflow is orchestrated through central planner contributed by agents and humans

For the code workflow, we could see something like this:

Set up a new GitHub project with a clean, well-structured setup and isolated code (for new projects).
Handle resource creation, both on-premises and in the cloud.
Back it all up with a CI/CD system tailored for both on-premises and cloud environments.
Support incremental code commits through PR with CI/CD

These steps would be helpful both for rolling out new features or fixes to existing projects and for starting fresh ones. I managed to get a mini reference implementation of a distributed key-value store (80%) using chatgpt(gpt4) and was able to build, test, and run the services locally (screenshot attached). I was experimenting with autogen to reproduce the steps that I have followed and see if I can achieve decent level autonomy (I am sure it will take many iterations :) ). I am still learning and experimenting. I will share my findings as I make progress. [image: Screenshot 2024-04-13 at 10.37.18 AM.png]

Thanks for all the great work and support.

Regards lnr

Message ID: @.***>

cforce commented 2 months ago

@raolak

A solution for this has just been released Checkout https://github.com/princeton-nlp/SWE-agent

thinkall commented 2 months ago

@thinkall Is it possible for you to provide an example of a 3rd party code chunk method? I'm very interested in extending the knowledge of an agent to my whole codebase :)

Hi @maximedupre , please check out an example of using 3rd party chunk method here: https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/#customizing-text-split-function

microsoft / autogen

[Roadmap] RAG #1657

Why RAG

RAG in AutoGen

Our Plan

Primary goals

Optional goals