langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.78k stars 15.12k forks source link

how to use LongContextReorder with RetrievalQAChain #10834

Closed imhager closed 6 months ago

imhager commented 1 year ago

Issue you'd like to raise.

in docs,has a simple demo to use LongContextReorder to reorder docs. but ,if i want use it with RetrievalQAChain ,how can i use it. thx

Suggestion:

No response

dosubot[bot] commented 1 year ago

🤖

Hello,

To use LongContextReorder with RetrievalQAChain in the LangChain framework, you would need to instantiate LongContextReorder and pass it as an argument to the RetrievalQAChain constructor or method where a document transformer is expected.

Here's a basic example of how you might use it:

from langchain.document_transformers import LongContextReorder
from langchain.chains import RetrievalQAChain

# Instantiate LongContextReorder
reorder = LongContextReorder()

# Instantiate RetrievalQAChain with LongContextReorder
qa_chain = RetrievalQAChain(document_transformer=reorder)

Please note that this is a simplified example and the actual usage might vary depending on the rest of your code.

The LongContextReorder class is used for reordering documents based on their relevance to a given context. It implements a specific reordering strategy known as "Lost in the middle", which is designed to address performance degradation when models must access relevant information in the middle of long contexts. You can find more details about this in the long_context_reorder.py file.

The RetrievalQAChain class is used for creating a question-answering system that retrieves relevant information from a given set of documents. You can find more details about this in the retrieval_qa.py file.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

pmespresso commented 11 months ago

@dosu-bot

1 validation error for RetrievalQA document_transformer extra fields not permitted (type=value_error.extra)

devstein commented 11 months ago

Hey @pmespresso As of now, Dosu only responds to the issue author. Stay tuned for changes to that soon :)

pmespresso commented 11 months ago

btw @imhager here's how I got it to work:

reordering = LongContextReorder()

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[
        reordering
    ]
)

compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=vector_store.as_retriever())

qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=compression_retriever, memory=memory, return_source_documents=True)
imhager commented 11 months ago

btw @imhager here's how I got it to work:

reordering = LongContextReorder()

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[
        reordering
    ]
)

compression_retriever = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=vector_store.as_retriever())

qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=compression_retriever, memory=memory, return_source_documents=True)

thanks, i'll try it.

pmespresso commented 11 months ago

welp OpenAI now has 128k context lengths so I think this is all useless now

imhager commented 11 months ago

welp OpenAI now has 128k context lengths so I think this is all useless now

i agree,As the length supported by the model grows,the need for reordering decreases

weissenbacherpwc commented 10 months ago

why do you think this is the case? The longer the context is, the more likely it is that "lost-in-the-middle" occurs if I understood this correctly

imhager commented 10 months ago

why do you think this is the case? The longer the context is, the more likely it is that "lost-in-the-middle" occurs if I understood this correctly

yes, u r right. My scenario is to sort the query results through a vector library, hoping for better results