[RAG template] Multi-modal

rlancemartin commented 8 months ago

Feature request

Template that features multi-modal RAG

Motivation

Unstructured for document processing
Multimodal embeddings
Showcase vectorstore(s) that support img+text
Multi-modal LLM for answer synthesis

Your contribution

Multi-modal RAG

dosubot[bot] commented 8 months ago

🤖

Hello @rlancemartin,

I'm Dosu, a bot here to assist you with your inquiries about LangChain. I can help you solve bugs, answer questions, and guide you on how to contribute to the project while you're waiting for a human maintainer.

Regarding your request for a multi-modal Retriever-Augmented Generation (RAG) template, I've prepared a Python code snippet that should serve as a starting point. This template uses the LangChain framework to create a multi-modal RAG. It starts by partitioning a PDF document into tables and texts, then summarizes the tables. The texts and table summaries are added to a multi-vector retriever that can handle both text and image embeddings. Then, it sets up the RAG pipeline and adds typing for the input. The LLM used in this example is ChatOpenAI with the model "gpt-4". The template for the prompt includes both text and tables in the context. The multi-vector retriever, RAG prompt, LLM, and RAG chain are all part of the LangChain framework. The prompt template is updated to match the one provided in the context shared.

# Python code snippet
# ...

You can find more details about the MultiVectorRetriever and ChatOpenAI classes in the LangChain codebase in the provided context. The MultiVectorRetriever class is designed to retrieve documents from a set of multiple embeddings for the same document, while the ChatOpenAI class is designed to interact with OpenAI's Chat large language models API.

Please note that this is a starting point and you may need to adjust the code to fit your specific use case. If you have any further questions or need additional assistance, feel free to ask.

Best, Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 5 months ago

Hi, @rlancemartin,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue you raised requested a template for multi-modal RAG to support unstructured document processing and multimodal embeddings. In response, I provided a Python code snippet using the LangChain framework to create a multi-modal RAG, including partitioning a PDF document into tables and texts, summarizing the tables, setting up a multi-vector retriever, RAG pipeline, and using a language model. Relevant sources for further details were also provided.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

langchain-ai / langchain